Overview

Dataset statistics

Number of variables49
Number of observations158957
Missing cells1251928
Missing cells (%)16.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory59.4 MiB
Average record size in memory392.0 B

Variable types

Numeric25
Categorical23
Unsupported1

Warnings

CITY has constant value "WASHINGTON" Constant
STATE has constant value "DC" Constant
SALEDATE has a high cardinality: 6937 distinct values High cardinality
FULLADDRESS has a high cardinality: 105978 distinct values High cardinality
NATIONALGRID has a high cardinality: 105949 distinct values High cardinality
ASSESSMENT_NBHD has a high cardinality: 57 distinct values High cardinality
ASSESSMENT_SUBNBHD has a high cardinality: 121 distinct values High cardinality
CENSUS_BLOCK has a high cardinality: 3848 distinct values High cardinality
Unnamed: 0 is highly correlated with ROOMS and 1 other fieldsHigh correlation
BATHRM is highly correlated with ROOMS and 3 other fieldsHigh correlation
NUM_UNITS is highly correlated with ROOMS and 2 other fieldsHigh correlation
ROOMS is highly correlated with Unnamed: 0 and 6 other fieldsHigh correlation
BEDRM is highly correlated with Unnamed: 0 and 4 other fieldsHigh correlation
AYB is highly correlated with EYBHigh correlation
YR_RMDL is highly correlated with CMPLX_NUMHigh correlation
EYB is highly correlated with AYBHigh correlation
PRICE is highly correlated with GBAHigh correlation
GBA is highly correlated with BATHRM and 4 other fieldsHigh correlation
KITCHENS is highly correlated with NUM_UNITS and 2 other fieldsHigh correlation
FIREPLACES is highly correlated with GBAHigh correlation
USECODE is highly correlated with NUM_UNITS and 1 other fieldsHigh correlation
CMPLX_NUM is highly correlated with YR_RMDLHigh correlation
LIVING_GBA is highly correlated with BATHRM and 2 other fieldsHigh correlation
LATITUDE is highly correlated with CENSUS_TRACT and 1 other fieldsHigh correlation
LONGITUDE is highly correlated with CENSUS_TRACT and 1 other fieldsHigh correlation
CENSUS_TRACT is highly correlated with LATITUDE and 3 other fieldsHigh correlation
X is highly correlated with LONGITUDE and 1 other fieldsHigh correlation
Y is highly correlated with LATITUDE and 1 other fieldsHigh correlation
Unnamed: 0 is highly correlated with ROOMS and 3 other fieldsHigh correlation
BATHRM is highly correlated with ROOMS and 3 other fieldsHigh correlation
NUM_UNITS is highly correlated with KITCHENS and 1 other fieldsHigh correlation
ROOMS is highly correlated with Unnamed: 0 and 5 other fieldsHigh correlation
BEDRM is highly correlated with Unnamed: 0 and 5 other fieldsHigh correlation
PRICE is highly correlated with LIVING_GBAHigh correlation
GBA is highly correlated with BATHRM and 2 other fieldsHigh correlation
KITCHENS is highly correlated with NUM_UNITSHigh correlation
USECODE is highly correlated with Unnamed: 0 and 1 other fieldsHigh correlation
LANDAREA is highly correlated with Unnamed: 0 and 2 other fieldsHigh correlation
LIVING_GBA is highly correlated with BATHRM and 3 other fieldsHigh correlation
LATITUDE is highly correlated with LONGITUDE and 3 other fieldsHigh correlation
LONGITUDE is highly correlated with LATITUDE and 3 other fieldsHigh correlation
CENSUS_TRACT is highly correlated with LATITUDE and 3 other fieldsHigh correlation
X is highly correlated with LATITUDE and 3 other fieldsHigh correlation
Y is highly correlated with LATITUDE and 3 other fieldsHigh correlation
Unnamed: 0 is highly correlated with ROOMSHigh correlation
BATHRM is highly correlated with ROOMS and 3 other fieldsHigh correlation
NUM_UNITS is highly correlated with KITCHENSHigh correlation
ROOMS is highly correlated with Unnamed: 0 and 5 other fieldsHigh correlation
BEDRM is highly correlated with BATHRM and 4 other fieldsHigh correlation
GBA is highly correlated with BATHRM and 2 other fieldsHigh correlation
KITCHENS is highly correlated with NUM_UNITSHigh correlation
LANDAREA is highly correlated with ROOMS and 1 other fieldsHigh correlation
LIVING_GBA is highly correlated with BATHRM and 2 other fieldsHigh correlation
LATITUDE is highly correlated with YHigh correlation
LONGITUDE is highly correlated with CENSUS_TRACT and 1 other fieldsHigh correlation
CENSUS_TRACT is highly correlated with LONGITUDE and 1 other fieldsHigh correlation
X is highly correlated with LONGITUDE and 1 other fieldsHigh correlation
Y is highly correlated with LATITUDEHigh correlation
ROOMS is highly correlated with BATHRM and 9 other fieldsHigh correlation
AC is highly correlated with EXTWALL and 5 other fieldsHigh correlation
QUADRANT is highly correlated with Y and 7 other fieldsHigh correlation
EXTWALL is highly correlated with AC and 4 other fieldsHigh correlation
FIREPLACES is highly correlated with YR_RMDLHigh correlation
STRUCT is highly correlated with AC and 8 other fieldsHigh correlation
BATHRM is highly correlated with ROOMS and 4 other fieldsHigh correlation
ROOF is highly correlated with STRUCT and 3 other fieldsHigh correlation
Y is highly correlated with QUADRANT and 8 other fieldsHigh correlation
LANDAREA is highly correlated with ROOMS and 3 other fieldsHigh correlation
CMPLX_NUM is highly correlated with Y and 5 other fieldsHigh correlation
AYB is highly correlated with EYB and 3 other fieldsHigh correlation
SOURCE is highly correlated with ROOMS and 10 other fieldsHigh correlation
STYLE is highly correlated with STRUCT and 2 other fieldsHigh correlation
HF_BATHRM is highly correlated with ROOMS and 1 other fieldsHigh correlation
WARD is highly correlated with QUADRANT and 12 other fieldsHigh correlation
LATITUDE is highly correlated with QUADRANT and 8 other fieldsHigh correlation
CNDTN is highly correlated with AC and 4 other fieldsHigh correlation
KITCHENS is highly correlated with NUM_UNITSHigh correlation
PRICE is highly correlated with GBAHigh correlation
USECODE is highly correlated with STRUCT and 1 other fieldsHigh correlation
EYB is highly correlated with AC and 6 other fieldsHigh correlation
HEAT is highly correlated with AC and 4 other fieldsHigh correlation
LIVING_GBA is highly correlated with ROOMS and 2 other fieldsHigh correlation
YR_RMDL is highly correlated with FIREPLACESHigh correlation
BEDRM is highly correlated with ROOMS and 5 other fieldsHigh correlation
GBA is highly correlated with ROOMS and 5 other fieldsHigh correlation
CENSUS_TRACT is highly correlated with QUADRANT and 12 other fieldsHigh correlation
GRADE is highly correlated with AC and 11 other fieldsHigh correlation
X is highly correlated with QUADRANT and 10 other fieldsHigh correlation
Unnamed: 0 is highly correlated with ROOMS and 14 other fieldsHigh correlation
ASSESSMENT_NBHD is highly correlated with ROOMS and 21 other fieldsHigh correlation
NUM_UNITS is highly correlated with STRUCT and 2 other fieldsHigh correlation
GIS_LAST_MOD_DTTM is highly correlated with ROOMS and 10 other fieldsHigh correlation
LONGITUDE is highly correlated with QUADRANT and 10 other fieldsHigh correlation
ROOF is highly correlated with CITY and 3 other fieldsHigh correlation
AC is highly correlated with HEAT and 2 other fieldsHigh correlation
QUADRANT is highly correlated with CITY and 3 other fieldsHigh correlation
HEAT is highly correlated with AC and 4 other fieldsHigh correlation
CITY is highly correlated with ROOF and 16 other fieldsHigh correlation
INTWALL is highly correlated with CITY and 3 other fieldsHigh correlation
ASSESSMENT_NBHD is highly correlated with QUADRANT and 5 other fieldsHigh correlation
STATE is highly correlated with ROOF and 16 other fieldsHigh correlation
BLDG_NUM is highly correlated with CITY and 1 other fieldsHigh correlation
QUALIFIED is highly correlated with CITY and 1 other fieldsHigh correlation
GIS_LAST_MOD_DTTM is highly correlated with ROOF and 11 other fieldsHigh correlation
STYLE is highly correlated with CITY and 3 other fieldsHigh correlation
EXTWALL is highly correlated with CITY and 3 other fieldsHigh correlation
STRUCT is highly correlated with CITY and 3 other fieldsHigh correlation
SOURCE is highly correlated with ROOF and 11 other fieldsHigh correlation
WARD is highly correlated with QUADRANT and 3 other fieldsHigh correlation
GRADE is highly correlated with CITY and 3 other fieldsHigh correlation
CNDTN is highly correlated with CITY and 3 other fieldsHigh correlation
NUM_UNITS has 52261 (32.9%) missing values Missing
YR_RMDL has 78029 (49.1%) missing values Missing
STORIES has 52305 (32.9%) missing values Missing
SALEDATE has 26770 (16.8%) missing values Missing
PRICE has 60741 (38.2%) missing values Missing
GBA has 52261 (32.9%) missing values Missing
STYLE has 52261 (32.9%) missing values Missing
STRUCT has 52261 (32.9%) missing values Missing
GRADE has 52261 (32.9%) missing values Missing
CNDTN has 52261 (32.9%) missing values Missing
EXTWALL has 52261 (32.9%) missing values Missing
ROOF has 52261 (32.9%) missing values Missing
INTWALL has 52261 (32.9%) missing values Missing
KITCHENS has 52262 (32.9%) missing values Missing
CMPLX_NUM has 106696 (67.1%) missing values Missing
LIVING_GBA has 106696 (67.1%) missing values Missing
FULLADDRESS has 52917 (33.3%) missing values Missing
CITY has 52906 (33.3%) missing values Missing
STATE has 52906 (33.3%) missing values Missing
NATIONALGRID has 52906 (33.3%) missing values Missing
ASSESSMENT_SUBNBHD has 32551 (20.5%) missing values Missing
CENSUS_BLOCK has 52906 (33.3%) missing values Missing
YR_RMDL is highly skewed (γ1 = -21.69324411) Skewed
STORIES is highly skewed (γ1 = 228.6851767) Skewed
FIREPLACES is highly skewed (γ1 = 398.5490354) Skewed
LANDAREA is highly skewed (γ1 = 78.59012056) Skewed
Unnamed: 0 is uniformly distributed Uniform
FULLADDRESS is uniformly distributed Uniform
NATIONALGRID is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
SQUARE is an unsupported type, check if it needs cleaning or further analysis Unsupported
HF_BATHRM has 93148 (58.6%) zeros Zeros
BEDRM has 5297 (3.3%) zeros Zeros
FIREPLACES has 103837 (65.3%) zeros Zeros

Reproduction

Analysis started2021-07-08 05:46:53.159994
Analysis finished2021-07-08 05:51:09.919361
Duration4 minutes and 16.76 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct158957
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79478
Minimum0
Maximum158956
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile7947.8
Q139739
median79478
Q3119217
95-th percentile151008.2
Maximum158956
Range158956
Interquartile range (IQR)79478

Descriptive statistics

Standard deviation45887.07771
Coefficient of variation (CV)0.5773557174
Kurtosis-1.2
Mean79478
Median Absolute Deviation (MAD)39739
Skewness0
Sum1.263358445 × 1010
Variance2105623900
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
74651
 
< 0.1%
545761
 
< 0.1%
115671
 
< 0.1%
95181
 
< 0.1%
156611
 
< 0.1%
136121
 
< 0.1%
33711
 
< 0.1%
13221
 
< 0.1%
54161
 
< 0.1%
Other values (158947)158947
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
1589561
< 0.1%
1589551
< 0.1%
1589541
< 0.1%
1589531
< 0.1%
1589521
< 0.1%
1589511
< 0.1%
1589501
< 0.1%
1589491
< 0.1%
1589481
< 0.1%
1589471
< 0.1%

BATHRM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.81067836
Minimum0
Maximum14
Zeros58
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q32
95-th percentile4
Maximum14
Range14
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9763959589
Coefficient of variation (CV)0.5392431813
Kurtosis3.893857213
Mean1.81067836
Median Absolute Deviation (MAD)1
Skewness1.514664494
Sum287820
Variance0.9533490686
MonotonicityNot monotonic
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
174555
46.9%
253325
33.5%
320785
 
13.1%
48119
 
5.1%
51367
 
0.9%
6500
 
0.3%
7129
 
0.1%
871
 
< 0.1%
058
 
< 0.1%
922
 
< 0.1%
Other values (5)26
 
< 0.1%
ValueCountFrequency (%)
058
 
< 0.1%
174555
46.9%
253325
33.5%
320785
 
13.1%
48119
 
5.1%
51367
 
0.9%
6500
 
0.3%
7129
 
0.1%
871
 
< 0.1%
922
 
< 0.1%
ValueCountFrequency (%)
141
 
< 0.1%
131
 
< 0.1%
123
 
< 0.1%
117
 
< 0.1%
1014
 
< 0.1%
922
 
< 0.1%
871
 
< 0.1%
7129
 
0.1%
6500
 
0.3%
51367
0.9%

HF_BATHRM
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4582371333
Minimum0
Maximum11
Zeros93148
Zeros (%)58.6%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile1
Maximum11
Range11
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.5875714745
Coefficient of variation (CV)1.282243257
Kurtosis2.074616926
Mean0.4582371333
Median Absolute Deviation (MAD)0
Skewness1.074096595
Sum72840
Variance0.3452402376
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
093148
58.6%
159258
37.3%
26186
 
3.9%
3289
 
0.2%
456
 
< 0.1%
512
 
< 0.1%
73
 
< 0.1%
63
 
< 0.1%
111
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
093148
58.6%
159258
37.3%
26186
 
3.9%
3289
 
0.2%
456
 
< 0.1%
512
 
< 0.1%
63
 
< 0.1%
73
 
< 0.1%
91
 
< 0.1%
111
 
< 0.1%
ValueCountFrequency (%)
111
 
< 0.1%
91
 
< 0.1%
73
 
< 0.1%
63
 
< 0.1%
512
 
< 0.1%
456
 
< 0.1%
3289
 
0.2%
26186
 
3.9%
159258
37.3%
093148
58.6%

HEAT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
Forced Air
53972 
Hot Water Rad
47202 
Warm Cool
33628 
Ht Pump
21412 
Wall Furnace
 
1120
Other values (9)
 
1623

Length

Max length14
Median length10
Mean length10.30165391
Min length7

Characters and Unicode

Total characters1637520
Distinct characters36
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWarm Cool
2nd rowWarm Cool
3rd rowHot Water Rad
4th rowHot Water Rad
5th rowWarm Cool

Common Values

ValueCountFrequency (%)
Forced Air53972
34.0%
Hot Water Rad47202
29.7%
Warm Cool33628
21.2%
Ht Pump21412
 
13.5%
Wall Furnace1120
 
0.7%
Water Base Brd402
 
0.3%
Elec Base Brd351
 
0.2%
No Data330
 
0.2%
Electric Rad144
 
0.1%
Gravity Furnac140
 
0.1%
Other values (4)256
 
0.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
air54011
14.8%
forced53972
14.8%
water47604
13.0%
rad47346
12.9%
hot47202
12.9%
cool33678
9.2%
warm33628
9.2%
pump21412
 
5.9%
ht21412
 
5.9%
furnace1120
 
0.3%
Other values (14)4367
 
1.2%

Most occurring characters

ValueCountFrequency (%)
206795
12.6%
r191629
11.7%
o168860
10.3%
a132511
 
8.1%
t116882
 
7.1%
e103944
 
6.3%
d102121
 
6.2%
W82352
 
5.0%
H68614
 
4.2%
c55910
 
3.4%
Other values (26)407902
24.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1064739
65.0%
Uppercase Letter365869
 
22.3%
Space Separator206795
 
12.6%
Dash Punctuation117
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r191629
18.0%
o168860
15.9%
a132511
12.4%
t116882
11.0%
e103944
9.8%
d102121
9.6%
c55910
 
5.3%
m55040
 
5.2%
i54579
 
5.1%
l36530
 
3.4%
Other values (9)46733
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
W82352
22.5%
H68614
18.8%
F55232
15.1%
A54128
14.8%
R47346
12.9%
C33678
9.2%
P21412
 
5.9%
B1506
 
0.4%
E584
 
0.2%
N330
 
0.1%
Other values (5)687
 
0.2%
Space Separator
ValueCountFrequency (%)
206795
100.0%
Dash Punctuation
ValueCountFrequency (%)
-117
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1430608
87.4%
Common206912
 
12.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r191629
13.4%
o168860
11.8%
a132511
 
9.3%
t116882
 
8.2%
e103944
 
7.3%
d102121
 
7.1%
W82352
 
5.8%
H68614
 
4.8%
c55910
 
3.9%
F55232
 
3.9%
Other values (24)352553
24.6%
Common
ValueCountFrequency (%)
206795
99.9%
-117
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1637520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
206795
12.6%
r191629
11.7%
o168860
10.3%
a132511
 
8.1%
t116882
 
7.1%
e103944
 
6.3%
d102121
 
6.2%
W82352
 
5.0%
H68614
 
4.2%
c55910
 
3.4%
Other values (26)407902
24.9%

AC
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
Y
114620 
N
44272 
0
 
65

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters158957
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowY
2nd rowY
3rd rowY
4th rowY
5th rowY

Common Values

ValueCountFrequency (%)
Y114620
72.1%
N44272
 
27.9%
065
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
y114620
72.1%
n44272
 
27.9%
065
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
Y114620
72.1%
N44272
 
27.9%
065
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter158892
> 99.9%
Decimal Number65
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
Y114620
72.1%
N44272
 
27.9%
Decimal Number
ValueCountFrequency (%)
065
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin158892
> 99.9%
Common65
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
Y114620
72.1%
N44272
 
27.9%
Common
ValueCountFrequency (%)
065
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII158957
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
Y114620
72.1%
N44272
 
27.9%
065
 
< 0.1%

NUM_UNITS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing52261
Missing (%)32.9%
Infinite0
Infinite (%)0.0%
Mean1.198039289
Minimum0
Maximum6
Zeros168
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5969244151
Coefficient of variation (CV)0.4982511179
Kurtosis12.38589718
Mean1.198039289
Median Absolute Deviation (MAD)0
Skewness3.467857332
Sum127826
Variance0.3563187574
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
192491
58.2%
29864
 
6.2%
43059
 
1.9%
31101
 
0.7%
0168
 
0.1%
510
 
< 0.1%
63
 
< 0.1%
(Missing)52261
32.9%
ValueCountFrequency (%)
0168
 
0.1%
192491
58.2%
29864
 
6.2%
31101
 
0.7%
43059
 
1.9%
510
 
< 0.1%
63
 
< 0.1%
ValueCountFrequency (%)
63
 
< 0.1%
510
 
< 0.1%
43059
 
1.9%
31101
 
0.7%
29864
 
6.2%
192491
58.2%
0168
 
0.1%

ROOMS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct40
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.187736306
Minimum0
Maximum48
Zeros138
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile3
Q14
median6
Q37
95-th percentile11
Maximum48
Range48
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.618164876
Coefficient of variation (CV)0.4231215984
Kurtosis4.563615163
Mean6.187736306
Median Absolute Deviation (MAD)2
Skewness1.283358565
Sum983584
Variance6.854787319
MonotonicityNot monotonic
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
637259
23.4%
722338
14.1%
420593
13.0%
317759
11.2%
516852
10.6%
816327
10.3%
97616
 
4.8%
105909
 
3.7%
25294
 
3.3%
122929
 
1.8%
Other values (30)6081
 
3.8%
ValueCountFrequency (%)
0138
 
0.1%
196
 
0.1%
25294
 
3.3%
317759
11.2%
420593
13.0%
516852
10.6%
637259
23.4%
722338
14.1%
816327
10.3%
97616
 
4.8%
ValueCountFrequency (%)
481
< 0.1%
411
< 0.1%
401
< 0.1%
392
< 0.1%
371
< 0.1%
351
< 0.1%
341
< 0.1%
321
< 0.1%
311
< 0.1%
301
< 0.1%

BEDRM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.732506275
Minimum0
Maximum24
Zeros5297
Zeros (%)3.3%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile5
Maximum24
Range24
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.358864244
Coefficient of variation (CV)0.497295928
Kurtosis2.951014382
Mean2.732506275
Median Absolute Deviation (MAD)1
Skewness0.7307726738
Sum434351
Variance1.846512034
MonotonicityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
357864
36.4%
234946
22.0%
424893
15.7%
124181
15.2%
56898
 
4.3%
05297
 
3.3%
63090
 
1.9%
8792
 
0.5%
7750
 
0.5%
9123
 
0.1%
Other values (10)123
 
0.1%
ValueCountFrequency (%)
05297
 
3.3%
124181
15.2%
234946
22.0%
357864
36.4%
424893
15.7%
56898
 
4.3%
63090
 
1.9%
7750
 
0.5%
8792
 
0.5%
9123
 
0.1%
ValueCountFrequency (%)
241
 
< 0.1%
201
 
< 0.1%
191
 
< 0.1%
162
 
< 0.1%
153
 
< 0.1%
142
 
< 0.1%
134
 
< 0.1%
1234
< 0.1%
1113
 
< 0.1%
1062
< 0.1%

AYB
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct220
Distinct (%)0.1%
Missing271
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1941.987579
Minimum1754
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum1754
5-th percentile1900
Q11918
median1937
Q31960
95-th percentile2007
Maximum2019
Range265
Interquartile range (IQR)42

Descriptive statistics

Standard deviation33.64023358
Coefficient of variation (CV)0.01732257916
Kurtosis-0.07799373814
Mean1941.987579
Median Absolute Deviation (MAD)21
Skewness0.5112530606
Sum308166241
Variance1131.665315
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19008967
 
5.6%
19255129
 
3.2%
19104563
 
2.9%
19404316
 
2.7%
19233724
 
2.3%
19273707
 
2.3%
19413420
 
2.2%
19263117
 
2.0%
19423058
 
1.9%
19392849
 
1.8%
Other values (210)115836
72.9%
ValueCountFrequency (%)
17542
< 0.1%
17651
 
< 0.1%
17763
< 0.1%
17804
< 0.1%
17821
 
< 0.1%
17841
 
< 0.1%
17851
 
< 0.1%
17871
 
< 0.1%
17881
 
< 0.1%
17903
< 0.1%
ValueCountFrequency (%)
20191
 
< 0.1%
201898
 
0.1%
2017592
0.4%
20161016
0.6%
2015906
0.6%
2014683
0.4%
2013794
0.5%
2012438
0.3%
2011389
 
0.2%
2010380
 
0.2%

YR_RMDL
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING
SKEWED

Distinct110
Distinct (%)0.1%
Missing78029
Missing (%)49.1%
Infinite0
Infinite (%)0.0%
Mean1998.243537
Minimum20
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum20
5-th percentile1973
Q11985
median2004
Q32010
95-th percentile2016
Maximum2019
Range1999
Interquartile range (IQR)25

Descriptive statistics

Standard deviation16.57578569
Coefficient of variation (CV)0.008295177927
Kurtosis2506.352909
Mean1998.243537
Median Absolute Deviation (MAD)9
Skewness-21.69324411
Sum161713853
Variance274.7566711
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20065029
 
3.2%
20054937
 
3.1%
20043985
 
2.5%
20073771
 
2.4%
19803310
 
2.1%
20032951
 
1.9%
20112856
 
1.8%
20082766
 
1.7%
19782690
 
1.7%
20102680
 
1.7%
Other values (100)45953
28.9%
(Missing)78029
49.1%
ValueCountFrequency (%)
201
< 0.1%
18802
< 0.1%
19002
< 0.1%
19101
< 0.1%
19111
< 0.1%
19121
< 0.1%
19131
< 0.1%
19151
< 0.1%
19162
< 0.1%
19172
< 0.1%
ValueCountFrequency (%)
20191
 
< 0.1%
2018417
 
0.3%
20171991
1.3%
20162190
1.4%
20152595
1.6%
20142645
1.7%
20132561
1.6%
20122648
1.7%
20112856
1.8%
20102680
1.7%

EYB
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct135
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1963.718024
Minimum1800
Maximum2018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum1800
5-th percentile1919
Q11954
median1963
Q31975
95-th percentile2009
Maximum2018
Range218
Interquartile range (IQR)21

Descriptive statistics

Standard deviation24.92315012
Coefficient of variation (CV)0.01269181716
Kurtosis0.6211768164
Mean1963.718024
Median Absolute Deviation (MAD)9
Skewness-0.1224757013
Sum312146726
Variance621.1634118
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
195712541
 
7.9%
195412346
 
7.8%
196710408
 
6.5%
19649362
 
5.9%
19607636
 
4.8%
19697033
 
4.4%
19435022
 
3.2%
19194707
 
3.0%
19504522
 
2.8%
19473718
 
2.3%
Other values (125)81662
51.4%
ValueCountFrequency (%)
18004
 
< 0.1%
18206
 
< 0.1%
18654
 
< 0.1%
187010
 
< 0.1%
1875100
0.1%
18766
 
< 0.1%
188055
 
< 0.1%
1885153
0.1%
18864
 
< 0.1%
188723
 
< 0.1%
ValueCountFrequency (%)
2018186
 
0.1%
2017886
0.6%
20161032
0.6%
20151360
0.9%
2014716
0.5%
2013886
0.6%
2012507
 
0.3%
2011726
0.5%
2010963
0.6%
2009707
0.4%

STORIES
Real number (ℝ≥0)

MISSING
SKEWED

Distinct40
Distinct (%)< 0.1%
Missing52305
Missing (%)32.9%
Infinite0
Infinite (%)0.0%
Mean2.091793122
Minimum0
Maximum826
Zeros43
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile1.5
Q12
median2
Q32
95-th percentile3
Maximum826
Range826
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.933322657
Coefficient of variation (CV)1.402300556
Kurtosis60245.74008
Mean2.091793122
Median Absolute Deviation (MAD)0
Skewness228.6851767
Sum223093.92
Variance8.604381812
MonotonicityNot monotonic
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
279357
49.9%
39230
 
5.8%
2.56105
 
3.8%
14683
 
2.9%
1.52291
 
1.4%
2.252225
 
1.4%
1.751175
 
0.7%
1.25452
 
0.3%
2.75444
 
0.3%
4375
 
0.2%
Other values (30)315
 
0.2%
(Missing)52305
32.9%
ValueCountFrequency (%)
043
 
< 0.1%
0.251
 
< 0.1%
0.51
 
< 0.1%
0.751
 
< 0.1%
14683
2.9%
1.25452
 
0.3%
1.341
 
< 0.1%
1.52291
1.4%
1.74
 
< 0.1%
1.751175
 
0.7%
ValueCountFrequency (%)
8261
 
< 0.1%
2752
 
< 0.1%
2501
 
< 0.1%
651
 
< 0.1%
431
 
< 0.1%
254
 
< 0.1%
201
 
< 0.1%
121
 
< 0.1%
936
< 0.1%
8.251
 
< 0.1%

SALEDATE
Categorical

HIGH CARDINALITY
MISSING

Distinct6937
Distinct (%)5.2%
Missing26770
Missing (%)16.8%
Memory size1.2 MiB
2007-04-10 00:00:00
 
413
1999-04-01 00:00:00
 
266
2001-01-01 00:00:00
 
258
2015-11-17 00:00:00
 
160
2010-05-04 00:00:00
 
134
Other values (6932)
130956 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters2511553
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique539 ?
Unique (%)0.4%

Sample

1st row2003-11-25 00:00:00
2nd row2000-08-17 00:00:00
3rd row2016-06-21 00:00:00
4th row2006-07-12 00:00:00
5th row2010-02-26 00:00:00

Common Values

ValueCountFrequency (%)
2007-04-10 00:00:00413
 
0.3%
1999-04-01 00:00:00266
 
0.2%
2001-01-01 00:00:00258
 
0.2%
2015-11-17 00:00:00160
 
0.1%
2010-05-04 00:00:00134
 
0.1%
2017-06-14 00:00:00124
 
0.1%
2018-05-29 00:00:00104
 
0.1%
2016-10-31 00:00:0095
 
0.1%
2018-07-03 00:00:0088
 
0.1%
2018-05-15 00:00:0085
 
0.1%
Other values (6927)130460
82.1%
(Missing)26770
 
16.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
00:00:00132187
50.0%
2007-04-10413
 
0.2%
1999-04-01266
 
0.1%
2001-01-01258
 
0.1%
2015-11-17160
 
0.1%
2010-05-04134
 
0.1%
2017-06-14124
 
< 0.1%
2018-05-29104
 
< 0.1%
2016-10-3195
 
< 0.1%
2018-07-0388
 
< 0.1%
Other values (6928)130545
49.4%

Most occurring characters

ValueCountFrequency (%)
01133288
45.1%
-264374
 
10.5%
:264374
 
10.5%
1204851
 
8.2%
2204719
 
8.2%
132187
 
5.3%
956502
 
2.2%
745174
 
1.8%
344372
 
1.8%
642700
 
1.7%
Other values (3)119012
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1850618
73.7%
Dash Punctuation264374
 
10.5%
Other Punctuation264374
 
10.5%
Space Separator132187
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01133288
61.2%
1204851
 
11.1%
2204719
 
11.1%
956502
 
3.1%
745174
 
2.4%
344372
 
2.4%
642700
 
2.3%
541316
 
2.2%
439264
 
2.1%
838432
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
-264374
100.0%
Space Separator
ValueCountFrequency (%)
132187
100.0%
Other Punctuation
ValueCountFrequency (%)
:264374
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2511553
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01133288
45.1%
-264374
 
10.5%
:264374
 
10.5%
1204851
 
8.2%
2204719
 
8.2%
132187
 
5.3%
956502
 
2.2%
745174
 
1.8%
344372
 
1.8%
642700
 
1.7%
Other values (3)119012
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2511553
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01133288
45.1%
-264374
 
10.5%
:264374
 
10.5%
1204851
 
8.2%
2204719
 
8.2%
132187
 
5.3%
956502
 
2.2%
745174
 
1.8%
344372
 
1.8%
642700
 
1.7%
Other values (3)119012
 
4.7%

PRICE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct13486
Distinct (%)13.7%
Missing60741
Missing (%)38.2%
Infinite0
Infinite (%)0.0%
Mean931351.5949
Minimum1
Maximum137427545
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum1
5-th percentile93086.75
Q1240000
median399999
Q3652000
95-th percentile1350000
Maximum137427545
Range137427544
Interquartile range (IQR)412000

Descriptive statistics

Standard deviation7061324.956
Coefficient of variation (CV)7.581803686
Kurtosis344.9019408
Mean931351.5949
Median Absolute Deviation (MAD)192999
Skewness18.3162491
Sum9.147362825 × 1010
Variance4.986231013 × 1013
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
350000595
 
0.4%
250000536
 
0.3%
300000523
 
0.3%
450000519
 
0.3%
375000488
 
0.3%
325000482
 
0.3%
550000459
 
0.3%
275000455
 
0.3%
500000426
 
0.3%
320000425
 
0.3%
Other values (13476)93308
58.7%
(Missing)60741
38.2%
ValueCountFrequency (%)
15
 
< 0.1%
104
 
< 0.1%
25014
< 0.1%
5004
 
< 0.1%
9361
 
< 0.1%
10005
 
< 0.1%
13771
 
< 0.1%
20002
 
< 0.1%
30002
 
< 0.1%
32701
 
< 0.1%
ValueCountFrequency (%)
137427545242
0.2%
53969391118
0.1%
536963911
 
< 0.1%
251000001
 
< 0.1%
250000001
 
< 0.1%
239602871
 
< 0.1%
220000001
 
< 0.1%
180000001
 
< 0.1%
161000001
 
< 0.1%
150000001
 
< 0.1%

QUALIFIED
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
U
82608 
Q
76349 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters158957
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowQ
2nd rowU
3rd rowQ
4th rowQ
5th rowU

Common Values

ValueCountFrequency (%)
U82608
52.0%
Q76349
48.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
u82608
52.0%
q76349
48.0%

Most occurring characters

ValueCountFrequency (%)
U82608
52.0%
Q76349
48.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter158957
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U82608
52.0%
Q76349
48.0%

Most occurring scripts

ValueCountFrequency (%)
Latin158957
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U82608
52.0%
Q76349
48.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII158957
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U82608
52.0%
Q76349
48.0%

SALE_NUM
Real number (ℝ≥0)

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.680032965
Minimum1
Maximum15
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile4
Maximum15
Range14
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.285898145
Coefficient of variation (CV)0.7654005437
Kurtosis4.895860251
Mean1.680032965
Median Absolute Deviation (MAD)0
Skewness2.131739779
Sum267053
Variance1.653534039
MonotonicityNot monotonic
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
1113671
71.5%
314738
 
9.3%
212901
 
8.1%
49851
 
6.2%
54687
 
2.9%
61970
 
1.2%
7703
 
0.4%
8261
 
0.2%
9108
 
0.1%
1037
 
< 0.1%
Other values (5)30
 
< 0.1%
ValueCountFrequency (%)
1113671
71.5%
212901
 
8.1%
314738
 
9.3%
49851
 
6.2%
54687
 
2.9%
61970
 
1.2%
7703
 
0.4%
8261
 
0.2%
9108
 
0.1%
1037
 
< 0.1%
ValueCountFrequency (%)
152
 
< 0.1%
142
 
< 0.1%
133
 
< 0.1%
126
 
< 0.1%
1117
 
< 0.1%
1037
 
< 0.1%
9108
 
0.1%
8261
 
0.2%
7703
 
0.4%
61970
1.2%

GBA
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4764
Distinct (%)4.5%
Missing52261
Missing (%)32.9%
Infinite0
Infinite (%)0.0%
Mean1714.539889
Minimum0
Maximum45384
Zeros15
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile864
Q11190
median1480
Q31966
95-th percentile3262
Maximum45384
Range45384
Interquartile range (IQR)776

Descriptive statistics

Standard deviation880.6778604
Coefficient of variation (CV)0.5136525933
Kurtosis135.3073343
Mean1714.539889
Median Absolute Deviation (MAD)342
Skewness5.635667353
Sum182934548
Variance775593.4937
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10881782
 
1.1%
11521661
 
1.0%
10241405
 
0.9%
8321364
 
0.9%
12801236
 
0.8%
10801094
 
0.7%
1200877
 
0.6%
1360862
 
0.5%
1440815
 
0.5%
800723
 
0.5%
Other values (4754)94877
59.7%
(Missing)52261
32.9%
ValueCountFrequency (%)
015
< 0.1%
1801
 
< 0.1%
2521
 
< 0.1%
2991
 
< 0.1%
3401
 
< 0.1%
3601
 
< 0.1%
3712
 
< 0.1%
3801
 
< 0.1%
3921
 
< 0.1%
3961
 
< 0.1%
ValueCountFrequency (%)
453841
< 0.1%
416041
< 0.1%
274511
< 0.1%
240301
< 0.1%
212101
< 0.1%
209481
< 0.1%
201201
< 0.1%
200151
< 0.1%
187841
< 0.1%
185881
< 0.1%

BLDG_NUM
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
1
158884 
2
 
59
3
 
8
4
 
4
5
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters158957
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1158884
> 99.9%
259
 
< 0.1%
38
 
< 0.1%
44
 
< 0.1%
52
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1158884
> 99.9%
259
 
< 0.1%
38
 
< 0.1%
44
 
< 0.1%
52
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
1158884
> 99.9%
259
 
< 0.1%
38
 
< 0.1%
44
 
< 0.1%
52
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number158957
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1158884
> 99.9%
259
 
< 0.1%
38
 
< 0.1%
44
 
< 0.1%
52
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common158957
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1158884
> 99.9%
259
 
< 0.1%
38
 
< 0.1%
44
 
< 0.1%
52
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII158957
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1158884
> 99.9%
259
 
< 0.1%
38
 
< 0.1%
44
 
< 0.1%
52
 
< 0.1%

STYLE
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct18
Distinct (%)< 0.1%
Missing52261
Missing (%)32.9%
Memory size1.2 MiB
2 Story
81137 
3 Story
9449 
2.5 Story Fin
 
7000
1 Story
 
4420
1.5 Story Fin
 
2655
Other values (13)
 
2035

Length

Max length15
Median length7
Mean length7.636987328
Min length6

Characters and Unicode

Total characters814836
Distinct characters34
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row3 Story
2nd row3 Story
3rd row3 Story
4th row3 Story
5th row3 Story

Common Values

ValueCountFrequency (%)
2 Story81137
51.0%
3 Story9449
 
5.9%
2.5 Story Fin7000
 
4.4%
1 Story4420
 
2.8%
1.5 Story Fin2655
 
1.7%
2.5 Story Unfin729
 
0.5%
4 Story369
 
0.2%
Split Level303
 
0.2%
Split Foyer279
 
0.2%
3.5 Story Fin133
 
0.1%
Other values (8)222
 
0.1%
(Missing)52261
32.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
story106027
47.3%
281137
36.2%
fin9801
 
4.4%
39449
 
4.2%
2.57729
 
3.5%
14420
 
2.0%
1.52767
 
1.2%
unfin851
 
0.4%
split582
 
0.3%
4369
 
0.2%
Other values (8)825
 
0.4%

Most occurring characters

ValueCountFrequency (%)
117261
14.4%
t106677
13.1%
S106609
13.1%
o106306
13.0%
r106306
13.0%
y106306
13.0%
288866
10.9%
n11506
 
1.4%
i11255
 
1.4%
.10652
 
1.3%
Other values (24)43092
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter452276
55.5%
Uppercase Letter117949
 
14.5%
Space Separator117261
 
14.4%
Decimal Number116679
 
14.3%
Other Punctuation10652
 
1.3%
Dash Punctuation19
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t106677
23.6%
o106306
23.5%
r106306
23.5%
y106306
23.5%
n11506
 
2.5%
i11255
 
2.5%
e988
 
0.2%
l970
 
0.2%
f916
 
0.2%
p582
 
0.1%
Other values (8)464
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
S106609
90.4%
F10080
 
8.5%
U851
 
0.7%
L322
 
0.3%
D65
 
0.1%
B19
 
< 0.1%
V2
 
< 0.1%
O1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
288866
76.2%
510652
 
9.1%
39590
 
8.2%
17187
 
6.2%
4384
 
0.3%
Space Separator
ValueCountFrequency (%)
117261
100.0%
Other Punctuation
ValueCountFrequency (%)
.10652
100.0%
Dash Punctuation
ValueCountFrequency (%)
-19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin570225
70.0%
Common244611
30.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
t106677
18.7%
S106609
18.7%
o106306
18.6%
r106306
18.6%
y106306
18.6%
n11506
 
2.0%
i11255
 
2.0%
F10080
 
1.8%
e988
 
0.2%
l970
 
0.2%
Other values (16)3222
 
0.6%
Common
ValueCountFrequency (%)
117261
47.9%
288866
36.3%
.10652
 
4.4%
510652
 
4.4%
39590
 
3.9%
17187
 
2.9%
4384
 
0.2%
-19
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII814836
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
117261
14.4%
t106677
13.1%
S106609
13.1%
o106306
13.0%
r106306
13.0%
y106306
13.0%
288866
10.9%
n11506
 
1.4%
i11255
 
1.4%
.10652
 
1.3%
Other values (24)43092
 
5.3%

STRUCT
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct9
Distinct (%)< 0.1%
Missing52261
Missing (%)32.9%
Memory size1.2 MiB
Row Inside
40593 
Single
32063 
Semi-Detached
16756 
Row End
12225 
Multi
4726 
Other values (4)
 
333

Length

Max length13
Median length10
Mean length8.70365337
Min length5

Characters and Unicode

Total characters928645
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRow Inside
2nd rowRow Inside
3rd rowRow Inside
4th rowRow Inside
5th rowSemi-Detached

Common Values

ValueCountFrequency (%)
Row Inside40593
25.5%
Single32063
20.2%
Semi-Detached16756
 
10.5%
Row End12225
 
7.7%
Multi4726
 
3.0%
Town Inside218
 
0.1%
Town End85
 
0.1%
Default26
 
< 0.1%
Vacant Land4
 
< 0.1%
(Missing)52261
32.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
row52818
33.0%
inside40811
25.5%
single32063
20.1%
semi-detached16756
 
10.5%
end12310
 
7.7%
multi4726
 
3.0%
town303
 
0.2%
default26
 
< 0.1%
land4
 
< 0.1%
vacant4
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e123168
13.3%
i94356
 
10.2%
n85495
 
9.2%
d69881
 
7.5%
53125
 
5.7%
o53121
 
5.7%
w53121
 
5.7%
R52818
 
5.7%
S48819
 
5.3%
I40811
 
4.4%
Other values (17)253930
27.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter682187
73.5%
Uppercase Letter176577
 
19.0%
Space Separator53125
 
5.7%
Dash Punctuation16756
 
1.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e123168
18.1%
i94356
13.8%
n85495
12.5%
d69881
10.2%
o53121
7.8%
w53121
7.8%
s40811
 
6.0%
l36815
 
5.4%
g32063
 
4.7%
t21512
 
3.2%
Other values (6)71844
10.5%
Uppercase Letter
ValueCountFrequency (%)
R52818
29.9%
S48819
27.6%
I40811
23.1%
D16782
 
9.5%
E12310
 
7.0%
M4726
 
2.7%
T303
 
0.2%
V4
 
< 0.1%
L4
 
< 0.1%
Space Separator
ValueCountFrequency (%)
53125
100.0%
Dash Punctuation
ValueCountFrequency (%)
-16756
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin858764
92.5%
Common69881
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e123168
14.3%
i94356
11.0%
n85495
10.0%
d69881
 
8.1%
o53121
 
6.2%
w53121
 
6.2%
R52818
 
6.2%
S48819
 
5.7%
I40811
 
4.8%
s40811
 
4.8%
Other values (15)196363
22.9%
Common
ValueCountFrequency (%)
53125
76.0%
-16756
 
24.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII928645
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e123168
13.3%
i94356
 
10.2%
n85495
 
9.2%
d69881
 
7.5%
53125
 
5.7%
o53121
 
5.7%
w53121
 
5.7%
R52818
 
5.7%
S48819
 
5.3%
I40811
 
4.4%
Other values (17)253930
27.3%

GRADE
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct13
Distinct (%)< 0.1%
Missing52261
Missing (%)32.9%
Memory size1.2 MiB
Average
37357 
Above Average
32101 
Good Quality
20800 
Very Good
8976 
Excellent
 
3390
Other values (8)
4072 

Length

Max length13
Median length12
Mean length10.11468096
Min length7

Characters and Unicode

Total characters1079196
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVery Good
2nd rowVery Good
3rd rowVery Good
4th rowVery Good
5th rowVery Good

Common Values

ValueCountFrequency (%)
Average37357
23.5%
Above Average32101
20.2%
Good Quality20800
 
13.1%
Very Good8976
 
5.6%
Excellent3390
 
2.1%
Superior2634
 
1.7%
Exceptional-A818
 
0.5%
Exceptional-B278
 
0.2%
Fair Quality150
 
0.1%
Exceptional-C92
 
0.1%
Other values (3)100
 
0.1%
(Missing)52261
32.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
average69458
41.2%
above32101
19.0%
good29776
17.6%
quality20956
 
12.4%
very8976
 
5.3%
excellent3390
 
2.0%
superior2634
 
1.6%
exceptional-a818
 
0.5%
exceptional-b278
 
0.2%
fair150
 
0.1%
Other values (5)211
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e190670
17.7%
A102377
9.5%
v101559
9.4%
o95575
8.9%
a91865
8.5%
r83852
 
7.8%
g69458
 
6.4%
62052
 
5.7%
b32101
 
3.0%
y29932
 
2.8%
Other values (22)219755
20.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter845870
78.4%
Uppercase Letter170011
 
15.8%
Space Separator62052
 
5.7%
Dash Punctuation1263
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e190670
22.5%
v101559
12.0%
o95575
11.3%
a91865
10.9%
r83852
9.9%
g69458
 
8.2%
b32101
 
3.8%
y29932
 
3.5%
d29776
 
3.5%
l28999
 
3.4%
Other values (8)92083
10.9%
Uppercase Letter
ValueCountFrequency (%)
A102377
60.2%
G29776
 
17.5%
Q20956
 
12.3%
V8976
 
5.3%
E4653
 
2.7%
S2634
 
1.5%
B278
 
0.2%
F150
 
0.1%
D94
 
0.1%
C92
 
0.1%
Other values (2)25
 
< 0.1%
Space Separator
ValueCountFrequency (%)
62052
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1263
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1015881
94.1%
Common63315
 
5.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e190670
18.8%
A102377
10.1%
v101559
10.0%
o95575
9.4%
a91865
9.0%
r83852
8.3%
g69458
 
6.8%
b32101
 
3.2%
y29932
 
2.9%
G29776
 
2.9%
Other values (20)188716
18.6%
Common
ValueCountFrequency (%)
62052
98.0%
-1263
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1079196
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e190670
17.7%
A102377
9.5%
v101559
9.4%
o95575
8.9%
a91865
8.5%
r83852
 
7.8%
g69458
 
6.4%
62052
 
5.7%
b32101
 
3.0%
y29932
 
2.8%
Other values (22)219755
20.4%

CNDTN
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing52261
Missing (%)32.9%
Memory size1.2 MiB
Average
58217 
Good
37497 
Very Good
8130 
Excellent
 
1338
Fair
 
1320
Other values (2)
 
194

Length

Max length9
Median length7
Mean length6.08112769
Min length4

Characters and Unicode

Total characters648832
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGood
2nd rowGood
3rd rowVery Good
4th rowGood
5th rowGood

Common Values

ValueCountFrequency (%)
Average58217
36.6%
Good37497
23.6%
Very Good8130
 
5.1%
Excellent1338
 
0.8%
Fair1320
 
0.8%
Poor175
 
0.1%
Default19
 
< 0.1%
(Missing)52261
32.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
average58217
50.7%
good45627
39.7%
very8130
 
7.1%
excellent1338
 
1.2%
fair1320
 
1.1%
poor175
 
0.2%
default19
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e127259
19.6%
o91604
14.1%
r67842
10.5%
a59556
9.2%
A58217
9.0%
v58217
9.0%
g58217
9.0%
G45627
 
7.0%
d45627
 
7.0%
V8130
 
1.3%
Other values (14)28536
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter525876
81.0%
Uppercase Letter114826
 
17.7%
Space Separator8130
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e127259
24.2%
o91604
17.4%
r67842
12.9%
a59556
11.3%
v58217
11.1%
g58217
11.1%
d45627
 
8.7%
y8130
 
1.5%
l2695
 
0.5%
t1357
 
0.3%
Other values (6)5372
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
A58217
50.7%
G45627
39.7%
V8130
 
7.1%
E1338
 
1.2%
F1320
 
1.1%
P175
 
0.2%
D19
 
< 0.1%
Space Separator
ValueCountFrequency (%)
8130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin640702
98.7%
Common8130
 
1.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e127259
19.9%
o91604
14.3%
r67842
10.6%
a59556
9.3%
A58217
9.1%
v58217
9.1%
g58217
9.1%
G45627
 
7.1%
d45627
 
7.1%
V8130
 
1.3%
Other values (13)20406
 
3.2%
Common
ValueCountFrequency (%)
8130
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII648832
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e127259
19.6%
o91604
14.1%
r67842
10.5%
a59556
9.2%
A58217
9.0%
v58217
9.0%
g58217
9.0%
G45627
 
7.0%
d45627
 
7.0%
V8130
 
1.3%
Other values (14)28536
 
4.4%

EXTWALL
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct25
Distinct (%)< 0.1%
Missing52261
Missing (%)32.9%
Memory size1.2 MiB
Common Brick
81068 
Brick/Siding
 
5569
Vinyl Siding
 
5290
Wood Siding
 
4540
Stucco
 
3216
Other values (20)
 
7013

Length

Max length14
Median length12
Mean length11.61341569
Min length5

Characters and Unicode

Total characters1239105
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowCommon Brick
2nd rowCommon Brick
3rd rowCommon Brick
4th rowCommon Brick
5th rowCommon Brick

Common Values

ValueCountFrequency (%)
Common Brick81068
51.0%
Brick/Siding5569
 
3.5%
Vinyl Siding5290
 
3.3%
Wood Siding4540
 
2.9%
Stucco3216
 
2.0%
Shingle1181
 
0.7%
Brick Veneer1069
 
0.7%
Aluminum954
 
0.6%
Stone744
 
0.5%
Brick/Stucco673
 
0.4%
Other values (15)2392
 
1.5%
(Missing)52261
32.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
brick82649
41.4%
common81068
40.6%
siding9896
 
5.0%
brick/siding5569
 
2.8%
vinyl5290
 
2.7%
wood4540
 
2.3%
stucco3267
 
1.6%
veneer1323
 
0.7%
shingle1181
 
0.6%
stone998
 
0.5%
Other values (16)3820
 
1.9%

Most occurring characters

ValueCountFrequency (%)
o177988
14.4%
m164044
13.2%
i128551
10.4%
n107957
8.7%
c98627
8.0%
92905
7.5%
r91215
7.4%
B89622
7.2%
k89622
7.2%
C81204
6.6%
Other values (25)117370
9.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter931708
75.2%
Uppercase Letter207047
 
16.7%
Space Separator92905
 
7.5%
Other Punctuation7445
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o177988
19.1%
m164044
17.6%
i128551
13.8%
n107957
11.6%
c98627
10.6%
r91215
9.8%
k89622
9.6%
d20599
 
2.2%
g16986
 
1.8%
e8236
 
0.9%
Other values (10)27883
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
B89622
43.3%
C81204
39.2%
S23365
 
11.3%
V6613
 
3.2%
W4540
 
2.2%
A956
 
0.5%
F512
 
0.2%
H119
 
0.1%
M66
 
< 0.1%
D32
 
< 0.1%
Other values (3)18
 
< 0.1%
Space Separator
ValueCountFrequency (%)
92905
100.0%
Other Punctuation
ValueCountFrequency (%)
/7445
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1138755
91.9%
Common100350
 
8.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o177988
15.6%
m164044
14.4%
i128551
11.3%
n107957
9.5%
c98627
8.7%
r91215
8.0%
B89622
7.9%
k89622
7.9%
C81204
7.1%
S23365
 
2.1%
Other values (23)86560
7.6%
Common
ValueCountFrequency (%)
92905
92.6%
/7445
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1239105
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o177988
14.4%
m164044
13.2%
i128551
10.4%
n107957
8.7%
c98627
8.0%
92905
7.5%
r91215
7.4%
B89622
7.2%
k89622
7.2%
C81204
6.6%
Other values (25)117370
9.5%

ROOF
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct16
Distinct (%)< 0.1%
Missing52261
Missing (%)32.9%
Memory size1.2 MiB
Built Up
31402 
Comp Shingle
30301 
Metal- Sms
29957 
Slate
11135 
Neopren
 
1254
Other values (11)
 
2647

Length

Max length14
Median length10
Mean length9.359226213
Min length5

Characters and Unicode

Total characters998592
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMetal- Sms
2nd rowBuilt Up
3rd rowBuilt Up
4th rowBuilt Up
5th rowNeopren

Common Values

ValueCountFrequency (%)
Built Up31402
19.8%
Comp Shingle30301
19.1%
Metal- Sms29957
18.8%
Slate11135
 
7.0%
Neopren1254
 
0.8%
Shake907
 
0.6%
Clay Tile654
 
0.4%
Shingle433
 
0.3%
Metal- Pre244
 
0.2%
Typical229
 
0.1%
Other values (6)180
 
0.1%
(Missing)52261
32.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
built31402
15.7%
up31402
15.7%
shingle30734
15.4%
comp30301
15.2%
metal30242
15.2%
sms29957
15.0%
slate11135
 
5.6%
neopren1254
 
0.6%
shake907
 
0.5%
tile671
 
0.3%
Other values (11)1425
 
0.7%

Most occurring characters

ValueCountFrequency (%)
l105067
 
10.5%
92734
 
9.3%
e76492
 
7.7%
t72911
 
7.3%
S72740
 
7.3%
p63329
 
6.3%
i63240
 
6.3%
m60360
 
6.0%
a43176
 
4.3%
n32111
 
3.2%
Other values (22)316432
31.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter676172
67.7%
Uppercase Letter199437
 
20.0%
Space Separator92734
 
9.3%
Dash Punctuation30249
 
3.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l105067
15.5%
e76492
11.3%
t72911
10.8%
p63329
9.4%
i63240
9.4%
m60360
8.9%
a43176
6.4%
n32111
 
4.7%
o32016
 
4.7%
h31641
 
4.7%
Other values (9)95829
14.2%
Uppercase Letter
ValueCountFrequency (%)
S72740
36.5%
B31402
15.7%
U31402
15.7%
C31119
15.6%
M30242
15.2%
N1254
 
0.6%
T900
 
0.5%
P253
 
0.1%
R102
 
0.1%
W16
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
-30249
100.0%
Space Separator
ValueCountFrequency (%)
92734
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin875609
87.7%
Common122983
 
12.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
l105067
12.0%
e76492
 
8.7%
t72911
 
8.3%
S72740
 
8.3%
p63329
 
7.2%
i63240
 
7.2%
m60360
 
6.9%
a43176
 
4.9%
n32111
 
3.7%
o32016
 
3.7%
Other values (20)254167
29.0%
Common
ValueCountFrequency (%)
92734
75.4%
-30249
 
24.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII998592
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l105067
 
10.5%
92734
 
9.3%
e76492
 
7.7%
t72911
 
7.3%
S72740
 
7.3%
p63329
 
6.3%
i63240
 
6.3%
m60360
 
6.0%
a43176
 
4.3%
n32111
 
3.2%
Other values (22)316432
31.7%

INTWALL
Categorical

HIGH CORRELATION
MISSING

Distinct12
Distinct (%)< 0.1%
Missing52261
Missing (%)32.9%
Memory size1.2 MiB
Hardwood
83643 
Hardwood/Carp
10938 
Wood Floor
 
8170
Carpet
 
3563
Lt Concrete
 
141
Other values (7)
 
241

Length

Max length13
Median length8
Mean length8.604540001
Min length6

Characters and Unicode

Total characters918070
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHardwood
2nd rowHardwood
3rd rowHardwood
4th rowHardwood
5th rowHardwood

Common Values

ValueCountFrequency (%)
Hardwood83643
52.6%
Hardwood/Carp10938
 
6.9%
Wood Floor8170
 
5.1%
Carpet3563
 
2.2%
Lt Concrete141
 
0.1%
Default110
 
0.1%
Ceramic Tile50
 
< 0.1%
Vinyl Comp28
 
< 0.1%
Parquet19
 
< 0.1%
Resiliant15
 
< 0.1%
Other values (2)19
 
< 0.1%
(Missing)52261
32.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
hardwood83643
72.7%
hardwood/carp10938
 
9.5%
floor8170
 
7.1%
wood8170
 
7.1%
carpet3563
 
3.1%
concrete141
 
0.1%
lt141
 
0.1%
default110
 
0.1%
tile50
 
< 0.1%
ceramic50
 
< 0.1%
Other values (6)122
 
0.1%

Most occurring characters

ValueCountFrequency (%)
o222017
24.2%
d197332
21.5%
r117474
12.8%
a109282
11.9%
H94581
10.3%
w94581
10.3%
C14720
 
1.6%
p14529
 
1.6%
/10938
 
1.2%
8402
 
0.9%
Other values (23)34214
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter772694
84.2%
Uppercase Letter126036
 
13.7%
Other Punctuation10938
 
1.2%
Space Separator8402
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o222017
28.7%
d197332
25.5%
r117474
15.2%
a109282
14.1%
w94581
12.2%
p14529
 
1.9%
l8386
 
1.1%
e4121
 
0.5%
t4002
 
0.5%
n197
 
< 0.1%
Other values (10)773
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
H94581
75.0%
C14720
 
11.7%
W8170
 
6.5%
F8170
 
6.5%
L141
 
0.1%
D110
 
0.1%
T56
 
< 0.1%
V41
 
< 0.1%
P19
 
< 0.1%
R15
 
< 0.1%
Space Separator
ValueCountFrequency (%)
8402
100.0%
Other Punctuation
ValueCountFrequency (%)
/10938
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin898730
97.9%
Common19340
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o222017
24.7%
d197332
22.0%
r117474
13.1%
a109282
12.2%
H94581
10.5%
w94581
10.5%
C14720
 
1.6%
p14529
 
1.6%
l8386
 
0.9%
W8170
 
0.9%
Other values (21)17658
 
2.0%
Common
ValueCountFrequency (%)
/10938
56.6%
8402
43.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII918070
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o222017
24.2%
d197332
21.5%
r117474
12.8%
a109282
11.9%
H94581
10.3%
w94581
10.3%
C14720
 
1.6%
p14529
 
1.6%
/10938
 
1.2%
8402
 
0.9%
Other values (23)34214
 
3.7%

KITCHENS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct8
Distinct (%)< 0.1%
Missing52262
Missing (%)32.9%
Infinite0
Infinite (%)0.0%
Mean1.219251136
Minimum0
Maximum44
Zeros117
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum44
Range44
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6211695991
Coefficient of variation (CV)0.5094681321
Kurtosis220.6893857
Mean1.219251136
Median Absolute Deviation (MAD)0
Skewness6.102645696
Sum130088
Variance0.3858516708
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
190434
56.9%
211904
 
7.5%
43051
 
1.9%
31173
 
0.7%
0117
 
0.1%
511
 
< 0.1%
64
 
< 0.1%
441
 
< 0.1%
(Missing)52262
32.9%
ValueCountFrequency (%)
0117
 
0.1%
190434
56.9%
211904
 
7.5%
31173
 
0.7%
43051
 
1.9%
511
 
< 0.1%
64
 
< 0.1%
441
 
< 0.1%
ValueCountFrequency (%)
441
 
< 0.1%
64
 
< 0.1%
511
 
< 0.1%
43051
 
1.9%
31173
 
0.7%
211904
 
7.5%
190434
56.9%
0117
 
0.1%

FIREPLACES
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.374673654
Minimum0
Maximum293920
Zeros103837
Zeros (%)65.3%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum293920
Range293920
Interquartile range (IQR)1

Descriptive statistics

Standard deviation737.2955949
Coefficient of variation (CV)310.4829136
Kurtosis158879.2742
Mean2.374673654
Median Absolute Deviation (MAD)0
Skewness398.5490354
Sum377471
Variance543604.7943
MonotonicityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
0103837
65.3%
140567
 
25.5%
210779
 
6.8%
32410
 
1.5%
4841
 
0.5%
5277
 
0.2%
6148
 
0.1%
747
 
< 0.1%
818
 
< 0.1%
910
 
< 0.1%
Other values (10)23
 
< 0.1%
ValueCountFrequency (%)
0103837
65.3%
140567
 
25.5%
210779
 
6.8%
32410
 
1.5%
4841
 
0.5%
5277
 
0.2%
6148
 
0.1%
747
 
< 0.1%
818
 
< 0.1%
910
 
< 0.1%
ValueCountFrequency (%)
2939201
 
< 0.1%
40681
 
< 0.1%
16011
 
< 0.1%
10171
 
< 0.1%
9221
 
< 0.1%
2001
 
< 0.1%
133
 
< 0.1%
123
 
< 0.1%
113
 
< 0.1%
108
< 0.1%

USECODE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.25299924
Minimum11
Maximum117
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum11
5-th percentile11
Q111
median13
Q317
95-th percentile24
Maximum117
Range106
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.725735883
Coefficient of variation (CV)0.261400132
Kurtosis37.24269694
Mean14.25299924
Median Absolute Deviation (MAD)2
Skewness2.556818989
Sum2265614
Variance13.88110787
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
1145597
28.7%
1231623
19.9%
1727511
17.3%
1624741
15.6%
1316588
 
10.4%
248272
 
5.2%
234497
 
2.8%
1579
 
< 0.1%
1931
 
< 0.1%
1178
 
< 0.1%
Other values (6)10
 
< 0.1%
ValueCountFrequency (%)
1145597
28.7%
1231623
19.9%
1316588
 
10.4%
1579
 
< 0.1%
1624741
15.6%
1727511
17.3%
1931
 
< 0.1%
234497
 
2.8%
248272
 
5.2%
291
 
< 0.1%
ValueCountFrequency (%)
1178
 
< 0.1%
1161
 
< 0.1%
832
 
< 0.1%
814
 
< 0.1%
411
 
< 0.1%
391
 
< 0.1%
291
 
< 0.1%
248272
5.2%
234497
2.8%
1931
 
< 0.1%

LANDAREA
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct11359
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2473.282158
Minimum0
Maximum942632
Zeros72
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile137
Q1697
median1649
Q33000
95-th percentile7475
Maximum942632
Range942632
Interquartile range (IQR)2303

Descriptive statistics

Standard deviation5059.046023
Coefficient of variation (CV)2.04547872
Kurtosis11264.01477
Mean2473.282158
Median Absolute Deviation (MAD)1092
Skewness78.59012056
Sum393145512
Variance25593946.66
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18001071
 
0.7%
20001020
 
0.6%
4000848
 
0.5%
5000833
 
0.5%
1600792
 
0.5%
2500601
 
0.4%
1700562
 
0.4%
1440552
 
0.3%
1500530
 
0.3%
3000511
 
0.3%
Other values (11349)151637
95.4%
ValueCountFrequency (%)
072
< 0.1%
12
 
< 0.1%
22
 
< 0.1%
32
 
< 0.1%
412
 
< 0.1%
515
 
< 0.1%
631
 
< 0.1%
763
< 0.1%
844
 
< 0.1%
9112
0.1%
ValueCountFrequency (%)
9426321
< 0.1%
6918171
< 0.1%
4987341
< 0.1%
4518041
< 0.1%
3396581
< 0.1%
3384352
< 0.1%
3291741
< 0.1%
2403771
< 0.1%
2274461
< 0.1%
2264791
< 0.1%

GIS_LAST_MOD_DTTM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
2018-07-22 18:01:43
106696 
2018-07-22 18:01:38
52261 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters3020183
Distinct characters10
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018-07-22 18:01:43
2nd row2018-07-22 18:01:43
3rd row2018-07-22 18:01:43
4th row2018-07-22 18:01:43
5th row2018-07-22 18:01:43

Common Values

ValueCountFrequency (%)
2018-07-22 18:01:43106696
67.1%
2018-07-22 18:01:3852261
32.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2018-07-22158957
50.0%
18:01:43106696
33.6%
18:01:3852261
 
16.4%

Most occurring characters

ValueCountFrequency (%)
2476871
15.8%
0476871
15.8%
1476871
15.8%
8370175
12.3%
-317914
10.5%
:317914
10.5%
7158957
 
5.3%
158957
 
5.3%
3158957
 
5.3%
4106696
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2225398
73.7%
Dash Punctuation317914
 
10.5%
Other Punctuation317914
 
10.5%
Space Separator158957
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2476871
21.4%
0476871
21.4%
1476871
21.4%
8370175
16.6%
7158957
 
7.1%
3158957
 
7.1%
4106696
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
-317914
100.0%
Space Separator
ValueCountFrequency (%)
158957
100.0%
Other Punctuation
ValueCountFrequency (%)
:317914
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3020183
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2476871
15.8%
0476871
15.8%
1476871
15.8%
8370175
12.3%
-317914
10.5%
:317914
10.5%
7158957
 
5.3%
158957
 
5.3%
3158957
 
5.3%
4106696
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3020183
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2476871
15.8%
0476871
15.8%
1476871
15.8%
8370175
12.3%
-317914
10.5%
:317914
10.5%
7158957
 
5.3%
158957
 
5.3%
3158957
 
5.3%
4106696
 
3.5%

SOURCE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
Residential
106696 
Condominium
52261 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1748527
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowResidential
2nd rowResidential
3rd rowResidential
4th rowResidential
5th rowResidential

Common Values

ValueCountFrequency (%)
Residential106696
67.1%
Condominium52261
32.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
residential106696
67.1%
condominium52261
32.9%

Most occurring characters

ValueCountFrequency (%)
i317914
18.2%
e213392
12.2%
n211218
12.1%
d158957
9.1%
R106696
 
6.1%
s106696
 
6.1%
t106696
 
6.1%
a106696
 
6.1%
l106696
 
6.1%
o104522
 
6.0%
Other values (3)209044
12.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1589570
90.9%
Uppercase Letter158957
 
9.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i317914
20.0%
e213392
13.4%
n211218
13.3%
d158957
10.0%
s106696
 
6.7%
t106696
 
6.7%
a106696
 
6.7%
l106696
 
6.7%
o104522
 
6.6%
m104522
 
6.6%
Uppercase Letter
ValueCountFrequency (%)
R106696
67.1%
C52261
32.9%

Most occurring scripts

ValueCountFrequency (%)
Latin1748527
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i317914
18.2%
e213392
12.2%
n211218
12.1%
d158957
9.1%
R106696
 
6.1%
s106696
 
6.1%
t106696
 
6.1%
a106696
 
6.1%
l106696
 
6.1%
o104522
 
6.0%
Other values (3)209044
12.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1748527
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i317914
18.2%
e213392
12.2%
n211218
12.1%
d158957
9.1%
R106696
 
6.1%
s106696
 
6.1%
t106696
 
6.1%
a106696
 
6.1%
l106696
 
6.1%
o104522
 
6.0%
Other values (3)209044
12.0%

CMPLX_NUM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2913
Distinct (%)5.6%
Missing106696
Missing (%)67.1%
Infinite0
Infinite (%)0.0%
Mean2371.544249
Minimum1001
Maximum5621
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum1001
5-th percentile1066
Q11501
median2265
Q32910
95-th percentile5176
Maximum5621
Range4620
Interquartile range (IQR)1409

Descriptive statistics

Standard deviation1114.272364
Coefficient of variation (CV)0.469850969
Kurtosis1.140354537
Mean2371.544249
Median Absolute Deviation (MAD)709
Skewness1.141672933
Sum123939274
Variance1241602.9
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1066720
 
0.5%
2423615
 
0.4%
1080429
 
0.3%
2282423
 
0.3%
2838396
 
0.2%
1657360
 
0.2%
2279324
 
0.2%
2661302
 
0.2%
2898292
 
0.2%
2430291
 
0.2%
Other values (2903)48109
30.3%
(Missing)106696
67.1%
ValueCountFrequency (%)
100136
 
< 0.1%
1002157
0.1%
100316
 
< 0.1%
100421
 
< 0.1%
10053
 
< 0.1%
10064
 
< 0.1%
10078
 
< 0.1%
100836
 
< 0.1%
1009101
0.1%
101097
0.1%
ValueCountFrequency (%)
56212
 
< 0.1%
56204
 
< 0.1%
561911
 
< 0.1%
56174
 
< 0.1%
561671
< 0.1%
56155
 
< 0.1%
56142
 
< 0.1%
56122
 
< 0.1%
561110
 
< 0.1%
561010
 
< 0.1%

LIVING_GBA
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2216
Distinct (%)4.2%
Missing106696
Missing (%)67.1%
Infinite0
Infinite (%)0.0%
Mean888.834542
Minimum0
Maximum8553
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum0
5-th percentile440
Q1616
median783
Q31060
95-th percentile1662
Maximum8553
Range8553
Interquartile range (IQR)444

Descriptive statistics

Standard deviation420.1858218
Coefficient of variation (CV)0.4727379528
Kurtosis15.69514905
Mean888.834542
Median Absolute Deviation (MAD)206
Skewness2.556377435
Sum46451382
Variance176556.1248
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
888205
 
0.1%
740185
 
0.1%
1210179
 
0.1%
670175
 
0.1%
1332168
 
0.1%
810148
 
0.1%
575145
 
0.1%
504144
 
0.1%
625143
 
0.1%
749137
 
0.1%
Other values (2206)50632
31.9%
(Missing)106696
67.1%
ValueCountFrequency (%)
01
 
< 0.1%
1041
 
< 0.1%
1481
 
< 0.1%
1991
 
< 0.1%
2091
 
< 0.1%
2171
 
< 0.1%
2311
 
< 0.1%
2321
 
< 0.1%
2371
 
< 0.1%
2383
< 0.1%
ValueCountFrequency (%)
85531
< 0.1%
71641
< 0.1%
61451
< 0.1%
61161
< 0.1%
60341
< 0.1%
60191
< 0.1%
59911
< 0.1%
59301
< 0.1%
58571
< 0.1%
56641
< 0.1%

FULLADDRESS
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct105978
Distinct (%)99.9%
Missing52917
Missing (%)33.3%
Memory size1.2 MiB
1754 STANTON TERRACE SE
 
5
1755 STANTON TERRACE SE
 
5
1517 SHIPPEN LANE SE
 
4
1508 SHIPPEN LANE SE
 
4
1530 34TH STREET NW
 
3
Other values (105973)
106019 

Length

Max length41
Median length20
Mean length20.21610713
Min length13

Characters and Unicode

Total characters2143716
Distinct characters39
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique105930 ?
Unique (%)99.9%

Sample

1st row1748 SWANN STREET NW
2nd row1746 SWANN STREET NW
3rd row1744 SWANN STREET NW
4th row1742 SWANN STREET NW
5th row1804 NEW HAMPSHIRE AVENUE NW

Common Values

ValueCountFrequency (%)
1754 STANTON TERRACE SE5
 
< 0.1%
1755 STANTON TERRACE SE5
 
< 0.1%
1517 SHIPPEN LANE SE4
 
< 0.1%
1508 SHIPPEN LANE SE4
 
< 0.1%
1530 34TH STREET NW3
 
< 0.1%
1507 TOBIAS DRIVE SE3
 
< 0.1%
312 MILLERS COURT NE3
 
< 0.1%
2600 TILDEN STREET NW3
 
< 0.1%
435 1ST STREET SE2
 
< 0.1%
3121 O STREET NW2
 
< 0.1%
Other values (105968)106006
66.7%
(Missing)52917
33.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
street70604
 
16.3%
nw50373
 
11.7%
ne32528
 
7.5%
se21799
 
5.0%
place14390
 
3.3%
avenue10741
 
2.5%
road4670
 
1.1%
terrace1673
 
0.4%
13th1451
 
0.3%
sw1340
 
0.3%
Other values (7137)222313
51.5%

Most occurring characters

ValueCountFrequency (%)
325842
15.2%
E285614
13.3%
T198655
 
9.3%
N142771
 
6.7%
R124381
 
5.8%
S123510
 
5.8%
188938
 
4.1%
A84258
 
3.9%
W60982
 
2.8%
260729
 
2.8%
Other values (29)648036
30.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1375524
64.2%
Decimal Number441577
 
20.6%
Space Separator325842
 
15.2%
Other Punctuation773
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E285614
20.8%
T198655
14.4%
N142771
10.4%
R124381
9.0%
S123510
9.0%
A84258
 
6.1%
W60982
 
4.4%
L43615
 
3.2%
O43077
 
3.1%
H41873
 
3.0%
Other values (16)226788
16.5%
Decimal Number
ValueCountFrequency (%)
188938
20.1%
260729
13.8%
359379
13.4%
450250
11.4%
044482
10.1%
539345
8.9%
629646
 
6.7%
725592
 
5.8%
822639
 
5.1%
920577
 
4.7%
Other Punctuation
ValueCountFrequency (%)
/562
72.7%
'211
 
27.3%
Space Separator
ValueCountFrequency (%)
325842
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1375524
64.2%
Common768192
35.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
E285614
20.8%
T198655
14.4%
N142771
10.4%
R124381
9.0%
S123510
9.0%
A84258
 
6.1%
W60982
 
4.4%
L43615
 
3.2%
O43077
 
3.1%
H41873
 
3.0%
Other values (16)226788
16.5%
Common
ValueCountFrequency (%)
325842
42.4%
188938
 
11.6%
260729
 
7.9%
359379
 
7.7%
450250
 
6.5%
044482
 
5.8%
539345
 
5.1%
629646
 
3.9%
725592
 
3.3%
822639
 
2.9%
Other values (3)21350
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII2143716
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
325842
15.2%
E285614
13.3%
T198655
 
9.3%
N142771
 
6.7%
R124381
 
5.8%
S123510
 
5.8%
188938
 
4.1%
A84258
 
3.9%
W60982
 
2.8%
260729
 
2.8%
Other values (29)648036
30.2%

CITY
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing52906
Missing (%)33.3%
Memory size1.2 MiB
WASHINGTON
106051 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters1060510
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWASHINGTON
2nd rowWASHINGTON
3rd rowWASHINGTON
4th rowWASHINGTON
5th rowWASHINGTON

Common Values

ValueCountFrequency (%)
WASHINGTON106051
66.7%
(Missing)52906
33.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
washington106051
100.0%

Most occurring characters

ValueCountFrequency (%)
N212102
20.0%
W106051
10.0%
A106051
10.0%
S106051
10.0%
H106051
10.0%
I106051
10.0%
G106051
10.0%
T106051
10.0%
O106051
10.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1060510
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N212102
20.0%
W106051
10.0%
A106051
10.0%
S106051
10.0%
H106051
10.0%
I106051
10.0%
G106051
10.0%
T106051
10.0%
O106051
10.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1060510
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N212102
20.0%
W106051
10.0%
A106051
10.0%
S106051
10.0%
H106051
10.0%
I106051
10.0%
G106051
10.0%
T106051
10.0%
O106051
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1060510
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N212102
20.0%
W106051
10.0%
A106051
10.0%
S106051
10.0%
H106051
10.0%
I106051
10.0%
G106051
10.0%
T106051
10.0%
O106051
10.0%

STATE
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing52906
Missing (%)33.3%
Memory size1.2 MiB
DC
106051 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters212102
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDC
2nd rowDC
3rd rowDC
4th rowDC
5th rowDC

Common Values

ValueCountFrequency (%)
DC106051
66.7%
(Missing)52906
33.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
dc106051
100.0%

Most occurring characters

ValueCountFrequency (%)
D106051
50.0%
C106051
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter212102
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
D106051
50.0%
C106051
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin212102
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
D106051
50.0%
C106051
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII212102
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D106051
50.0%
C106051
50.0%

ZIPCODE
Real number (ℝ≥0)

Distinct24
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean20012.69456
Minimum20001
Maximum20392
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum20001
5-th percentile20001
Q120007
median20011
Q320018
95-th percentile20032
Maximum20392
Range391
Interquartile range (IQR)11

Descriptive statistics

Standard deviation15.62708441
Coefficient of variation (CV)0.0007808585878
Kurtosis403.503694
Mean20012.69456
Median Absolute Deviation (MAD)6
Skewness16.86329334
Sum3181137877
Variance244.2057673
MonotonicityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
2001116352
 
10.3%
2000216310
 
10.3%
2000913171
 
8.3%
2001912458
 
7.8%
2001610644
 
6.7%
2000110549
 
6.6%
200209805
 
6.2%
200079029
 
5.7%
200038015
 
5.0%
200086801
 
4.3%
Other values (14)45822
28.8%
ValueCountFrequency (%)
2000110549
6.6%
2000216310
10.3%
200038015
5.0%
200041082
 
0.7%
200053404
 
2.1%
20006118
 
0.1%
200079029
5.7%
200086801
4.3%
2000913171
8.3%
200106428
 
4.0%
ValueCountFrequency (%)
20392186
 
0.1%
2005219
 
< 0.1%
200373730
 
2.3%
200361892
 
1.2%
200325111
3.2%
200243105
 
2.0%
200209805
6.2%
2001912458
7.8%
200185670
3.6%
200175622
3.5%

NATIONALGRID
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct105949
Distinct (%)99.9%
Missing52906
Missing (%)33.3%
Memory size1.2 MiB
18S UJ 28168 01936
 
5
18S UJ 28233 01950
 
5
18S UJ 28025 01949
 
4
18S UJ 28045 01888
 
4
18S UJ 25398 04622
 
4
Other values (105944)
106029 

Length

Max length18
Median length18
Mean length18
Min length18

Characters and Unicode

Total characters1908918
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique105863 ?
Unique (%)99.8%

Sample

1st row18S UJ 23061 09289
2nd row18S UJ 23067 09289
3rd row18S UJ 23074 09289
4th row18S UJ 23078 09288
5th row18S UJ 23188 09253

Common Values

ValueCountFrequency (%)
18S UJ 28168 019365
 
< 0.1%
18S UJ 28233 019505
 
< 0.1%
18S UJ 28025 019494
 
< 0.1%
18S UJ 28045 018884
 
< 0.1%
18S UJ 25398 046224
 
< 0.1%
18S UJ 26425 065273
 
< 0.1%
18S UJ 28027 019723
 
< 0.1%
18S UJ 21962 121643
 
< 0.1%
18S UJ 20689 087753
 
< 0.1%
18S UJ 29362 043132
 
< 0.1%
Other values (105939)106015
66.7%
(Missing)52906
33.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
18s106051
25.0%
uj104654
24.7%
uh1397
 
0.3%
1398242
 
< 0.1%
0987337
 
< 0.1%
2464737
 
< 0.1%
2653537
 
< 0.1%
2496436
 
< 0.1%
2726136
 
< 0.1%
0709035
 
< 0.1%
Other values (32714)211842
49.9%

Most occurring characters

ValueCountFrequency (%)
318153
16.7%
1247619
13.0%
8191493
10.0%
2163655
8.6%
0140659
7.4%
S106051
 
5.6%
U106051
 
5.6%
J104654
 
5.5%
397866
 
5.1%
789267
 
4.7%
Other values (5)343450
18.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1272612
66.7%
Uppercase Letter318153
 
16.7%
Space Separator318153
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1247619
19.5%
8191493
15.0%
2163655
12.9%
0140659
11.1%
397866
 
7.7%
789267
 
7.0%
487068
 
6.8%
686294
 
6.8%
984497
 
6.6%
584194
 
6.6%
Uppercase Letter
ValueCountFrequency (%)
S106051
33.3%
U106051
33.3%
J104654
32.9%
H1397
 
0.4%
Space Separator
ValueCountFrequency (%)
318153
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1590765
83.3%
Latin318153
 
16.7%

Most frequent character per script

Common
ValueCountFrequency (%)
318153
20.0%
1247619
15.6%
8191493
12.0%
2163655
10.3%
0140659
8.8%
397866
 
6.2%
789267
 
5.6%
487068
 
5.5%
686294
 
5.4%
984497
 
5.3%
Latin
ValueCountFrequency (%)
S106051
33.3%
U106051
33.3%
J104654
32.9%
H1397
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1908918
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
318153
16.7%
1247619
13.0%
8191493
10.0%
2163655
8.6%
0140659
7.4%
S106051
 
5.6%
U106051
 
5.6%
J104654
 
5.5%
397866
 
5.1%
789267
 
4.7%
Other values (5)343450
18.0%

LATITUDE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct105522
Distinct (%)66.4%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean38.91485395
Minimum38.81973129
Maximum38.99553969
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum38.81973129
5-th percentile38.85933517
Q138.89542487
median38.91533652
Q338.93607485
95-th percentile38.96507244
Maximum38.99553969
Range0.1758084
Interquartile range (IQR)0.04064998

Descriptive statistics

Standard deviation0.03172261554
Coefficient of variation (CV)0.0008151801258
Kurtosis0.0225011778
Mean38.91485395
Median Absolute Deviation (MAD)0.02030494
Skewness-0.2981973416
Sum6185749.525
Variance0.001006324337
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38.934668351128
 
0.7%
38.880820981022
 
0.6%
38.9094433592
 
0.4%
38.8747892524
 
0.3%
38.90314058504
 
0.3%
38.94449932429
 
0.3%
38.89542487428
 
0.3%
38.86303776410
 
0.3%
38.90445577406
 
0.3%
38.92806083367
 
0.2%
Other values (105512)153146
96.3%
ValueCountFrequency (%)
38.819731291
< 0.1%
38.819789311
< 0.1%
38.819888951
< 0.1%
38.8199431
< 0.1%
38.819953351
< 0.1%
38.820019381
< 0.1%
38.820060291
< 0.1%
38.820113811
< 0.1%
38.820140011
< 0.1%
38.820205591
< 0.1%
ValueCountFrequency (%)
38.995539691
< 0.1%
38.99543521
< 0.1%
38.995300861
< 0.1%
38.995162731
< 0.1%
38.995030651
< 0.1%
38.994971391
< 0.1%
38.994894231
< 0.1%
38.994848151
< 0.1%
38.994797291
< 0.1%
38.994751161
< 0.1%

LONGITUDE
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct105935
Distinct (%)66.6%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean-77.01667632
Minimum-77.11390873
Maximum-76.90975796
Zeros0
Zeros (%)0.0%
Negative158956
Negative (%)> 99.9%
Memory size1.2 MiB

Quantile statistics

Minimum-77.11390873
5-th percentile-77.08320993
Q1-77.0428921
median-77.01959633
Q3-76.98862646
95-th percentile-76.94106208
Maximum-76.90975796
Range0.20415077
Interquartile range (IQR)0.0542656425

Descriptive statistics

Standard deviation0.04093841016
Coefficient of variation (CV)-0.0005315525431
Kurtosis-0.3879446439
Mean-77.01667632
Median Absolute Deviation (MAD)0.028281505
Skewness0.1670056934
Sum-12242262.8
Variance0.001675953426
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-77.084688261128
 
0.7%
-77.014270731022
 
0.6%
-77.03969267592
 
0.4%
-77.01630113524
 
0.3%
-77.01777614504
 
0.3%
-77.06124775429
 
0.3%
-77.02156757428
 
0.3%
-76.94956535410
 
0.3%
-77.03105732406
 
0.3%
-77.0792663367
 
0.2%
Other values (105925)153146
96.3%
ValueCountFrequency (%)
-77.113908731
< 0.1%
-77.11380971
< 0.1%
-77.113774211
< 0.1%
-77.11362751
< 0.1%
-77.113569321
< 0.1%
-77.1133891
< 0.1%
-77.113320661
< 0.1%
-77.11327541
< 0.1%
-77.113270461
< 0.1%
-77.113188871
< 0.1%
ValueCountFrequency (%)
-76.909757961
< 0.1%
-76.90975831
< 0.1%
-76.909842661
< 0.1%
-76.909847311
< 0.1%
-76.909882811
< 0.1%
-76.909895581
< 0.1%
-76.90996991
< 0.1%
-76.909983461
< 0.1%
-76.910018131
< 0.1%
-76.910023651
< 0.1%

ASSESSMENT_NBHD
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct57
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size1.2 MiB
Old City 2
15978 
Old City 1
15000 
Columbia Heights
 
9474
Brookland
 
6568
Petworth
 
6323
Other values (52)
105613 

Length

Max length28
Median length10
Mean length11.51869071
Min length4

Characters and Unicode

Total characters1830965
Distinct characters50
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowOld City 2
2nd rowOld City 2
3rd rowOld City 2
4th rowOld City 2
5th rowOld City 2

Common Values

ValueCountFrequency (%)
Old City 215978
 
10.1%
Old City 115000
 
9.4%
Columbia Heights9474
 
6.0%
Brookland6568
 
4.1%
Petworth6323
 
4.0%
Deanwood5983
 
3.8%
Chevy Chase5354
 
3.4%
Congress Heights4729
 
3.0%
Brightwood4112
 
2.6%
Mt. Pleasant4052
 
2.5%
Other values (47)81383
51.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
city30978
 
10.2%
old30978
 
10.2%
heights24847
 
8.2%
118132
 
6.0%
215978
 
5.3%
park15730
 
5.2%
columbia9474
 
3.1%
brookland6568
 
2.2%
petworth6323
 
2.1%
deanwood5983
 
2.0%
Other values (67)137659
45.5%

Most occurring characters

ValueCountFrequency (%)
t149450
 
8.2%
143694
 
7.8%
e129510
 
7.1%
i127197
 
6.9%
l118398
 
6.5%
o113961
 
6.2%
a106761
 
5.8%
r104884
 
5.7%
d75993
 
4.2%
s73438
 
4.0%
Other values (40)687679
37.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1372729
75.0%
Uppercase Letter263832
 
14.4%
Space Separator143694
 
7.8%
Decimal Number41026
 
2.2%
Dash Punctuation5632
 
0.3%
Other Punctuation4052
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t149450
10.9%
e129510
9.4%
i127197
9.3%
l118398
 
8.6%
o113961
 
8.3%
a106761
 
7.8%
r104884
 
7.6%
d75993
 
5.5%
s73438
 
5.3%
n67782
 
4.9%
Other values (13)305355
22.2%
Uppercase Letter
ValueCountFrequency (%)
C73052
27.7%
H34945
13.2%
O32431
12.3%
P28807
 
10.9%
B15411
 
5.8%
D9408
 
3.6%
F9406
 
3.6%
W8626
 
3.3%
G7067
 
2.7%
S6952
 
2.6%
Other values (10)37727
14.3%
Decimal Number
ValueCountFrequency (%)
120340
49.6%
215978
38.9%
32500
 
6.1%
62208
 
5.4%
Space Separator
ValueCountFrequency (%)
143694
100.0%
Dash Punctuation
ValueCountFrequency (%)
-5632
100.0%
Other Punctuation
ValueCountFrequency (%)
.4052
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1636561
89.4%
Common194404
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
t149450
 
9.1%
e129510
 
7.9%
i127197
 
7.8%
l118398
 
7.2%
o113961
 
7.0%
a106761
 
6.5%
r104884
 
6.4%
d75993
 
4.6%
s73438
 
4.5%
C73052
 
4.5%
Other values (33)563917
34.5%
Common
ValueCountFrequency (%)
143694
73.9%
120340
 
10.5%
215978
 
8.2%
-5632
 
2.9%
.4052
 
2.1%
32500
 
1.3%
62208
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1830965
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t149450
 
8.2%
143694
 
7.8%
e129510
 
7.1%
i127197
 
6.9%
l118398
 
6.5%
o113961
 
6.2%
a106761
 
5.8%
r104884
 
5.7%
d75993
 
4.2%
s73438
 
4.0%
Other values (40)687679
37.6%

ASSESSMENT_SUBNBHD
Categorical

HIGH CARDINALITY
MISSING

Distinct121
Distinct (%)0.1%
Missing32551
Missing (%)20.5%
Memory size1.2 MiB
040 D Old City 2
 
4403
040 E Old City 2
 
2968
040 C Old City 2
 
2886
042 B Petworth
 
2763
039 K Old City 1
 
2640
Other values (116)
110746 

Length

Max length25
Median length16
Mean length17.13591127
Min length10

Characters and Unicode

Total characters2166082
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row040 D Old City 2
2nd row040 D Old City 2
3rd row040 D Old City 2
4th row040 D Old City 2
5th row040 D Old City 2

Common Values

ValueCountFrequency (%)
040 D Old City 24403
 
2.8%
040 E Old City 22968
 
1.9%
040 C Old City 22886
 
1.8%
042 B Petworth2763
 
1.7%
039 K Old City 12640
 
1.7%
007 E Brookland2388
 
1.5%
040 B Old City 22289
 
1.4%
015 D Columbia Heights2246
 
1.4%
015 A Columbia Heights2206
 
1.4%
015 E Columbia Heights2183
 
1.4%
Other values (111)99434
62.6%
(Missing)32551
 
20.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
b32080
 
6.5%
city30978
 
6.3%
old30978
 
6.3%
a29585
 
6.0%
c25558
 
5.2%
heights23663
 
4.8%
04015978
 
3.2%
215978
 
3.2%
115000
 
3.0%
03915000
 
3.0%
Other values (79)259231
52.5%

Most occurring characters

ValueCountFrequency (%)
367623
17.0%
0165735
 
7.7%
t115579
 
5.3%
i106591
 
4.9%
e94632
 
4.4%
o90288
 
4.2%
l88407
 
4.1%
C85277
 
3.9%
a73869
 
3.4%
d68048
 
3.1%
Other values (44)910033
42.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1045358
48.3%
Decimal Number414612
 
19.1%
Space Separator367623
 
17.0%
Uppercase Letter334437
 
15.4%
Other Punctuation4052
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C85277
25.5%
B45279
13.5%
A34103
 
10.2%
H32931
 
9.8%
O30978
 
9.3%
D22102
 
6.6%
P18322
 
5.5%
E13494
 
4.0%
F7174
 
2.1%
M6780
 
2.0%
Other values (11)37997
11.4%
Lowercase Letter
ValueCountFrequency (%)
t115579
11.1%
i106591
10.2%
e94632
 
9.1%
o90288
 
8.6%
l88407
 
8.5%
a73869
 
7.1%
d68048
 
6.5%
r61254
 
5.9%
s58370
 
5.6%
n51075
 
4.9%
Other values (11)237245
22.7%
Decimal Number
ValueCountFrequency (%)
0165735
40.0%
155999
 
13.5%
246301
 
11.2%
431938
 
7.7%
330310
 
7.3%
926573
 
6.4%
522808
 
5.5%
617849
 
4.3%
810518
 
2.5%
76581
 
1.6%
Space Separator
ValueCountFrequency (%)
367623
100.0%
Other Punctuation
ValueCountFrequency (%)
.4052
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1379795
63.7%
Common786287
36.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
t115579
 
8.4%
i106591
 
7.7%
e94632
 
6.9%
o90288
 
6.5%
l88407
 
6.4%
C85277
 
6.2%
a73869
 
5.4%
d68048
 
4.9%
r61254
 
4.4%
s58370
 
4.2%
Other values (32)537480
39.0%
Common
ValueCountFrequency (%)
367623
46.8%
0165735
21.1%
155999
 
7.1%
246301
 
5.9%
431938
 
4.1%
330310
 
3.9%
926573
 
3.4%
522808
 
2.9%
617849
 
2.3%
810518
 
1.3%
Other values (2)10633
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2166082
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
367623
17.0%
0165735
 
7.7%
t115579
 
5.3%
i106591
 
4.9%
e94632
 
4.4%
o90288
 
4.2%
l88407
 
4.1%
C85277
 
3.9%
a73869
 
3.4%
d68048
 
3.1%
Other values (44)910033
42.0%

CENSUS_TRACT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct176
Distinct (%)0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5348.216324
Minimum100
Maximum11100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum100
5-th percentile502
Q12102
median5201
Q38302
95-th percentile10200
Maximum11100
Range11000
Interquartile range (IQR)6200

Descriptive statistics

Standard deviation3369.645953
Coefficient of variation (CV)0.6300504222
Kurtosis-1.425048885
Mean5348.216324
Median Absolute Deviation (MAD)3100
Skewness0.007889343771
Sum850131074
Variance11354513.85
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
55002933
 
1.8%
8012620
 
1.6%
10012552
 
1.6%
3002182
 
1.4%
53012179
 
1.4%
1002090
 
1.3%
15002081
 
1.3%
44001960
 
1.2%
11001879
 
1.2%
52011766
 
1.1%
Other values (166)136714
86.0%
ValueCountFrequency (%)
1002090
1.3%
2021684
1.1%
3002182
1.4%
400605
 
0.4%
501519
 
0.3%
5021023
 
0.6%
6001291
0.8%
7011290
0.8%
702696
 
0.4%
8012620
1.6%
ValueCountFrequency (%)
111001501
0.9%
11000911
0.6%
10900160
 
0.1%
10800386
 
0.2%
10700392
 
0.2%
106001317
0.8%
105001022
0.6%
10400945
0.6%
10300773
0.5%
10200895
0.6%

CENSUS_BLOCK
Categorical

HIGH CARDINALITY
MISSING

Distinct3848
Distinct (%)3.6%
Missing52906
Missing (%)33.3%
Memory size1.2 MiB
009000 1001
 
340
009201 1004
 
312
009509 3004
 
206
009904 2009
 
204
009508 2005
 
195
Other values (3843)
104794 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1166561
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique65 ?
Unique (%)0.1%

Sample

1st row004201 2006
2nd row004201 2006
3rd row004201 2006
4th row004201 2006
5th row004201 2006

Common Values

ValueCountFrequency (%)
009000 1001340
 
0.2%
009201 1004312
 
0.2%
009509 3004206
 
0.1%
009904 2009204
 
0.1%
009508 2005195
 
0.1%
009000 1010189
 
0.1%
007809 1001175
 
0.1%
000300 3003170
 
0.1%
009700 1012160
 
0.1%
000801 2008158
 
0.1%
Other values (3838)103942
65.4%
(Missing)52906
33.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
10013969
 
1.9%
10043712
 
1.8%
10023594
 
1.7%
20003378
 
1.6%
10033290
 
1.6%
20053139
 
1.5%
20023101
 
1.5%
10063030
 
1.4%
10052991
 
1.4%
20012960
 
1.4%
Other values (369)178938
84.4%

Most occurring characters

ValueCountFrequency (%)
0565517
48.5%
1135208
 
11.6%
106051
 
9.1%
297511
 
8.4%
352275
 
4.5%
943096
 
3.7%
441593
 
3.6%
735999
 
3.1%
832817
 
2.8%
528786
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1060510
90.9%
Space Separator106051
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0565517
53.3%
1135208
 
12.7%
297511
 
9.2%
352275
 
4.9%
943096
 
4.1%
441593
 
3.9%
735999
 
3.4%
832817
 
3.1%
528786
 
2.7%
627708
 
2.6%
Space Separator
ValueCountFrequency (%)
106051
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1166561
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0565517
48.5%
1135208
 
11.6%
106051
 
9.1%
297511
 
8.4%
352275
 
4.5%
943096
 
3.7%
441593
 
3.6%
735999
 
3.1%
832817
 
2.8%
528786
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1166561
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0565517
48.5%
1135208
 
11.6%
106051
 
9.1%
297511
 
8.4%
352275
 
4.5%
943096
 
3.7%
441593
 
3.6%
735999
 
3.1%
832817
 
2.8%
528786
 
2.5%

WARD
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size1.2 MiB
Ward 6
23973 
Ward 3
23688 
Ward 4
22202 
Ward 2
22167 
Ward 5
21359 
Other values (3)
45567 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters953736
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWard 2
2nd rowWard 2
3rd rowWard 2
4th rowWard 2
5th rowWard 2

Common Values

ValueCountFrequency (%)
Ward 623973
15.1%
Ward 323688
14.9%
Ward 422202
14.0%
Ward 222167
13.9%
Ward 521359
13.4%
Ward 117455
11.0%
Ward 717206
10.8%
Ward 810906
6.9%
(Missing)1
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
ward158956
50.0%
623973
 
7.5%
323688
 
7.5%
422202
 
7.0%
222167
 
7.0%
521359
 
6.7%
117455
 
5.5%
717206
 
5.4%
810906
 
3.4%

Most occurring characters

ValueCountFrequency (%)
W158956
16.7%
a158956
16.7%
r158956
16.7%
d158956
16.7%
158956
16.7%
623973
 
2.5%
323688
 
2.5%
422202
 
2.3%
222167
 
2.3%
521359
 
2.2%
Other values (3)45567
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter476868
50.0%
Uppercase Letter158956
 
16.7%
Space Separator158956
 
16.7%
Decimal Number158956
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
623973
15.1%
323688
14.9%
422202
14.0%
222167
13.9%
521359
13.4%
117455
11.0%
717206
10.8%
810906
6.9%
Lowercase Letter
ValueCountFrequency (%)
a158956
33.3%
r158956
33.3%
d158956
33.3%
Uppercase Letter
ValueCountFrequency (%)
W158956
100.0%
Space Separator
ValueCountFrequency (%)
158956
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin635824
66.7%
Common317912
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
158956
50.0%
623973
 
7.5%
323688
 
7.5%
422202
 
7.0%
222167
 
7.0%
521359
 
6.7%
117455
 
5.5%
717206
 
5.4%
810906
 
3.4%
Latin
ValueCountFrequency (%)
W158956
25.0%
a158956
25.0%
r158956
25.0%
d158956
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII953736
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
W158956
16.7%
a158956
16.7%
r158956
16.7%
d158956
16.7%
158956
16.7%
623973
 
2.5%
323688
 
2.5%
422202
 
2.3%
222167
 
2.3%
521359
 
2.2%
Other values (3)45567
 
4.8%

SQUARE
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size1.2 MiB

X
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3291
Distinct (%)2.1%
Missing237
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean-77.01671188
Minimum-77.11313486
Maximum-76.91051093
Zeros0
Zeros (%)0.0%
Negative158720
Negative (%)99.9%
Memory size1.2 MiB

Quantile statistics

Minimum-77.11313486
5-th percentile-77.08299299
Q1-77.04289439
median-77.01958148
Q3-76.98884235
95-th percentile-76.94112875
Maximum-76.91051093
Range0.2026239377
Interquartile range (IQR)0.05405204484

Descriptive statistics

Standard deviation0.04093318544
Coefficient of variation (CV)-0.0005314844589
Kurtosis-0.3867357071
Mean-77.01671188
Median Absolute Deviation (MAD)0.02825954114
Skewness0.1677871257
Sum-12224092.51
Variance0.00167552567
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-77.084690581366
 
0.9%
-77.014273011022
 
0.6%
-77.0751295721
 
0.5%
-77.03969497600
 
0.4%
-76.95977324559
 
0.4%
-77.01630341524
 
0.3%
-77.01777843504
 
0.3%
-77.06125007430
 
0.3%
-77.02156986428
 
0.3%
-76.94956761415
 
0.3%
Other values (3281)152151
95.7%
ValueCountFrequency (%)
-77.1131348624
< 0.1%
-77.1117765819
 
< 0.1%
-77.1115357225
< 0.1%
-77.1104593832
< 0.1%
-77.1104455917
 
< 0.1%
-77.1092496640
< 0.1%
-77.1090308719
 
< 0.1%
-77.1083511740
< 0.1%
-77.1082575152
< 0.1%
-77.108180558
< 0.1%
ValueCountFrequency (%)
-76.9105109339
< 0.1%
-76.911434559
 
< 0.1%
-76.9116352724
< 0.1%
-76.9127683333
< 0.1%
-76.912820717
< 0.1%
-76.9130337719
< 0.1%
-76.913427382
 
< 0.1%
-76.9141749923
< 0.1%
-76.914268938
 
< 0.1%
-76.9143736313
 
< 0.1%

Y
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3291
Distinct (%)2.1%
Missing237
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean38.91484631
Minimum38.82057613
Maximum38.99364643
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB

Quantile statistics

Minimum38.82057613
5-th percentile38.85937727
Q138.89543232
median38.91522932
Q338.93607698
95-th percentile38.9646814
Maximum38.99364643
Range0.1730703058
Interquartile range (IQR)0.04064465254

Descriptive statistics

Standard deviation0.03168182178
Coefficient of variation (CV)0.0008141320032
Kurtosis0.02283935778
Mean38.91484631
Median Absolute Deviation (MAD)0.02021112457
Skewness-0.3010348985
Sum6176564.406
Variance0.001003737831
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38.934675811366
 
0.9%
38.880828431022
 
0.6%
38.92025156721
 
0.5%
38.90945076600
 
0.4%
38.92507725559
 
0.4%
38.87479665524
 
0.3%
38.90314803504
 
0.3%
38.94450678430
 
0.3%
38.89543232428
 
0.3%
38.92961295415
 
0.3%
Other values (3281)152151
95.7%
ValueCountFrequency (%)
38.8205761336
 
< 0.1%
38.82179948115
0.1%
38.821893689
 
< 0.1%
38.8234622420
 
< 0.1%
38.8237781723
 
< 0.1%
38.8243597661
< 0.1%
38.82459455123
0.1%
38.8249462740
 
< 0.1%
38.8255403148
 
< 0.1%
38.8255434315
 
< 0.1%
ValueCountFrequency (%)
38.9936464336
< 0.1%
38.9936033730
< 0.1%
38.9934345620
 
< 0.1%
38.9930303714
 
< 0.1%
38.991703414
 
< 0.1%
38.9916179342
< 0.1%
38.9911623911
 
< 0.1%
38.9900142451
< 0.1%
38.9897711641
< 0.1%
38.9897257345
< 0.1%

QUADRANT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing237
Missing (%)0.1%
Memory size1.2 MiB
NW
89736 
NE
37675 
SE
27224 
SW
 
4085

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters317440
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNW
2nd rowNW
3rd rowNW
4th rowNW
5th rowNW

Common Values

ValueCountFrequency (%)
NW89736
56.5%
NE37675
23.7%
SE27224
 
17.1%
SW4085
 
2.6%
(Missing)237
 
0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
nw89736
56.5%
ne37675
23.7%
se27224
 
17.2%
sw4085
 
2.6%

Most occurring characters

ValueCountFrequency (%)
N127411
40.1%
W93821
29.6%
E64899
20.4%
S31309
 
9.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter317440
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N127411
40.1%
W93821
29.6%
E64899
20.4%
S31309
 
9.9%

Most occurring scripts

ValueCountFrequency (%)
Latin317440
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N127411
40.1%
W93821
29.6%
E64899
20.4%
S31309
 
9.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII317440
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N127411
40.1%
W93821
29.6%
E64899
20.4%
S31309
 
9.9%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Unnamed: 0BATHRMHF_BATHRMHEATACNUM_UNITSROOMSBEDRMAYBYR_RMDLEYBSTORIESSALEDATEPRICEQUALIFIEDSALE_NUMGBABLDG_NUMSTYLESTRUCTGRADECNDTNEXTWALLROOFINTWALLKITCHENSFIREPLACESUSECODELANDAREAGIS_LAST_MOD_DTTMSOURCECMPLX_NUMLIVING_GBAFULLADDRESSCITYSTATEZIPCODENATIONALGRIDLATITUDELONGITUDEASSESSMENT_NBHDASSESSMENT_SUBNBHDCENSUS_TRACTCENSUS_BLOCKWARDSQUAREXYQUADRANT
0040Warm CoolY2.0841910.01988.019723.02003-11-25 00:00:001095000.0Q12522.013 StoryRow InsideVery GoodGoodCommon BrickMetal- SmsHardwood2.052416802018-07-22 18:01:43ResidentialNaNNaN1748 SWANN STREET NWWASHINGTONDC20009.018S UJ 23061 0928938.914680-77.040832Old City 2040 D Old City 24201.0004201 2006Ward 2152-77.04042938.914881NW
1131Warm CoolY2.01151898.02007.019723.02000-08-17 00:00:00NaNU12567.013 StoryRow InsideVery GoodGoodCommon BrickBuilt UpHardwood2.042416802018-07-22 18:01:43ResidentialNaNNaN1746 SWANN STREET NWWASHINGTONDC20009.018S UJ 23067 0928938.914683-77.040764Old City 2040 D Old City 24201.0004201 2006Ward 2152-77.04042938.914881NW
2231Hot Water RadY2.0951910.02009.019843.02016-06-21 00:00:002100000.0Q32522.013 StoryRow InsideVery GoodVery GoodCommon BrickBuilt UpHardwood2.042416802018-07-22 18:01:43ResidentialNaNNaN1744 SWANN STREET NWWASHINGTONDC20009.018S UJ 23074 0928938.914684-77.040678Old City 2040 D Old City 24201.0004201 2006Ward 2152-77.04042938.914881NW
3331Hot Water RadY2.0851900.02003.019843.02006-07-12 00:00:001602000.0Q12484.013 StoryRow InsideVery GoodGoodCommon BrickBuilt UpHardwood2.032416802018-07-22 18:01:43ResidentialNaNNaN1742 SWANN STREET NWWASHINGTONDC20009.018S UJ 23078 0928838.914683-77.040629Old City 2040 D Old City 24201.0004201 2006Ward 2152-77.04042938.914881NW
4421Warm CoolY1.01131913.02012.019853.0NaNNaNU15255.013 StorySemi-DetachedVery GoodGoodCommon BrickNeoprenHardwood1.001320322018-07-22 18:01:43ResidentialNaNNaN1804 NEW HAMPSHIRE AVENUE NWWASHINGTONDC20009.018S UJ 23188 0925338.914383-77.039361Old City 2040 D Old City 24201.0004201 2006Ward 2152-77.04042938.914881NW
5532Hot Water RadY1.01051913.0NaN19724.02010-02-26 00:00:001950000.0Q15344.014 StoryRow InsideVery GoodGoodCommon BrickBuilt UpHardwood1.041121962018-07-22 18:01:43ResidentialNaNNaN1709 S STREET NWWASHINGTONDC20009.018S UJ 23157 0924838.914331-77.039715Old City 2040 D Old City 24201.0004201 2006Ward 2152-77.04042938.914881NW
6610Warm CoolY2.0521917.01988.019572.02011-05-02 00:00:00NaNU11260.012 StoryRow InsideAbove AverageAverageCommon BrickMetal- SmsHardwood2.002412612018-07-22 18:01:43ResidentialNaNNaN1769 SWANN STREET NWWASHINGTONDC20009.018S UJ 23042 0932338.914983-77.041055Old City 2040 D Old City 24201.0004201 2005Ward 2152-77.04042938.914881NW
7731Hot Water RadY2.0841906.02011.019723.02011-09-29 00:00:001050000.0Q12401.013 StoryRow InsideVery GoodAverageCommon BrickMetal- SmsHardwood2.012416272018-07-22 18:01:43ResidentialNaNNaN1746 1/2 T STREET NWWASHINGTONDC20009.018S UJ 23124 0936838.915408-77.040129Old City 2040 D Old City 24201.0004201 2005Ward 2152-77.04042938.914881NW
8831Warm CoolY2.0731908.02008.019672.02018-05-03 00:00:001430000.0Q41488.012 StoryRow InsideAbove AverageVery GoodCommon BrickBuilt UpHardwood2.012414242018-07-22 18:01:43ResidentialNaNNaN1727 SWANN STREET NWWASHINGTONDC20009.018S UJ 23142 0932438.915017-77.039903Old City 2040 D Old City 24201.0004201 2005Ward 2152-77.04042938.914881NW
9911Hot Water RadY1.0621908.01979.019502.02008-12-05 00:00:00NaNU11590.012 StoryRow InsideGood QualityAverageCommon BrickBuilt UpHardwood1.001114242018-07-22 18:01:43ResidentialNaNNaN1733 SWANN STREET NWWASHINGTONDC20009.018S UJ 23127 0932438.915015-77.040081Old City 2040 D Old City 24201.0004201 2005Ward 2152-77.04042938.914881NW

Last rows

Unnamed: 0BATHRMHF_BATHRMHEATACNUM_UNITSROOMSBEDRMAYBYR_RMDLEYBSTORIESSALEDATEPRICEQUALIFIEDSALE_NUMGBABLDG_NUMSTYLESTRUCTGRADECNDTNEXTWALLROOFINTWALLKITCHENSFIREPLACESUSECODELANDAREAGIS_LAST_MOD_DTTMSOURCECMPLX_NUMLIVING_GBAFULLADDRESSCITYSTATEZIPCODENATIONALGRIDLATITUDELONGITUDEASSESSMENT_NBHDASSESSMENT_SUBNBHDCENSUS_TRACTCENSUS_BLOCKWARDSQUAREXYQUADRANT
15894715894720Forced AirYNaN421938.02006.01938NaN2008-06-30 00:00:00320000.0U1NaN1NaNNaNNaNNaNNaNNaNNaNNaN0164972018-07-22 18:01:38Condominium2786.0809.0NaNNaNNaN20001.0NaN38.911840-77.01942Old City 2040 B Old City 24801.0NaNWard 6477-77.01942238.911848NW
15894815894820Forced AirYNaN421938.02006.01938NaN2012-10-22 00:00:00460000.0Q1NaN1NaNNaNNaNNaNNaNNaNNaNNaN0165732018-07-22 18:01:38Condominium2786.0934.0NaNNaNNaN20001.0NaN38.911840-77.01942Old City 2040 B Old City 24801.0NaNWard 6477-77.01942238.911848NW
15894915894911Forced AirYNaN411938.02006.01938NaN2015-06-09 00:00:00550000.0Q6NaN1NaNNaNNaNNaNNaNNaNNaNNaN0166902018-07-22 18:01:38Condominium2786.01123.0NaNNaNNaN20001.0NaN38.911840-77.01942Old City 2040 B Old City 24801.0NaNWard 6477-77.01942238.911848NW
15895015895030Forced AirYNaN531938.02006.01938NaN2015-12-24 00:00:00635000.0U5NaN1NaNNaNNaNNaNNaNNaNNaNNaN0164072018-07-22 18:01:38Condominium2786.01330.0NaNNaNNaN20001.0NaN38.911840-77.01942Old City 2040 B Old City 24801.0NaNWard 6477-77.01942238.911848NW
15895115895131Forced AirYNaN531938.02006.01938NaN2009-11-12 00:00:00389000.0U1NaN1NaNNaNNaNNaNNaNNaNNaNNaN0165022018-07-22 18:01:38Condominium2786.01413.0NaNNaNNaN20001.0NaN38.911840-77.01942Old City 2040 B Old City 24801.0NaNWard 6477-77.01942238.911848NW
15895215895210Forced AirYNaN311938.02006.01938NaN2015-04-03 00:00:00399900.0Q4NaN1NaNNaNNaNNaNNaNNaNNaNNaN0163942018-07-22 18:01:38Condominium2786.0639.0NaNNaNNaN20001.0NaN38.911840-77.01942Old City 2040 B Old City 24801.0NaNWard 6477-77.01942238.911848NW
15895315895310Forced AirYNaN421938.02006.01938NaN2013-10-04 00:00:00416000.0Q1NaN1NaNNaNNaNNaNNaNNaNNaNNaN0165062018-07-22 18:01:38Condominium2786.0820.0NaNNaNNaN20001.0NaN38.911840-77.01942Old City 2040 B Old City 24801.0NaNWard 6477-77.01942238.911848NW
15895415895420Forced AirYNaN421920.02007.01920NaN2008-09-30 00:00:00600000.0U1NaN1NaNNaNNaNNaNNaNNaNNaNNaN0164672018-07-22 18:01:38Condominium2880.01167.0NaNNaNNaN20001.0NaN38.911840-77.01942Old City 2040 B Old City 24801.0NaNWard 6477-77.01942238.911848NW
15895515895510Warm CoolYNaN201965.0NaN1965NaN2015-04-14 00:00:00215100.0Q3NaN1NaNNaNNaNNaNNaNNaNNaNNaN0173322018-07-22 18:01:38Condominium2275.0447.0NaNNaNNaN20024.0NaN38.872953-77.01823Southwest WaterfrontNaN11000.0NaNWard 6504-77.01823238.872961SW
15895615895610Warm CoolYNaN201965.0NaN1965NaN2002-07-22 00:00:00NaNU1NaN1NaNNaNNaNNaNNaNNaNNaNNaN0173322018-07-22 18:01:38Condominium2275.0447.0NaNNaNNaN20024.0NaN38.872953-77.01823Southwest WaterfrontNaN11000.0NaNWard 6504-77.01823238.872961SW